Often it is difficult to decide if the data that has been collected constitutes one data set or many - this is called ‘granularity’. It is important to get the level or ‘granularity’ correct otherwise it is possible to end up with either too many or too few records which makes it difficult for a user to find what they want.
- The correct level for a data set is a cruise, survey or a set of repeat observations with a common purpose,
- A data set usually constitutes a specifically-funded piece of work,
- The data set should be easily extractable from a database for a third party,
If you are searching for a data set using a portal and get the result every time you search by different combinations of time, location and parameter then it is probably too coarse.
To guide the metadata creator better some draft examples of what should be considered a dataset are given below:
- A monitoring programme that produces data for the same parameters at the same locations each year
- A multidisciplinary cruise that has been specifically funded to answer a specific research question and is not anticipated to be carried out repeatedly
- A number of different types of data collected over the course of one year in a specific location that forms an Environmental Impact Assessment for a specific activity.
- A survey carried out over one month in a Special Area of Conservation that has been funded as one piece of work.
It is difficult to be very prescriptive as often the decision of whether a collection of information forms one or more datasets is case specific. However, in all cases the metadata creator should ask him/herself what would be most effective for a user to quickly find to the information they want via the portal.
The differences between raw data, processed data and data products are defined in the following paper:
D. Lear, P. Herman, G. Van Hoey, L. Schepers, N. Tonné, M. Lipizer, F.E. Muller-Karger, W. Appeltans, W.D. Kissling, N. Holdsworth, M. Edwards, E. Pecceu, H. Nygård, G. Canonico, S. Birchenough, G. Graham, K. Deneudt, S. Claus, P. Oset, (2020) Supporting the essential - Recommendations for the development of accessible and interoperable marine biological data products, Marine Policy, Volume 117, 103958
A series is a collection of data sets such as:
- A collection of cruises that are linked by a common research question and so form part of a larger project (e.g. North Sea Project, RAPID)
- A project that has collected data from a range of sources to produce a large number of GIS layers across many topics (e.g. Defra contract MB102 - Biophysical data layers).
- A project that collects the same theme of data on a regular basis but at distinctly separate geographical locations each time (e.g. the Maritime and Coastguard Agency (MCA) Civil Hydrography Programme)
MEDIN considers a ‘service’ to be a publically available web service (specifically an implementation of a geo-portal software architecture), that provides views of, access to or processes geographic and thematic information. In order to achieve compliance with INSPIRE the following definitions of service types are used;
A discovery service – Allows data resources to be found through searches of metadata which describe the service(s), series or data set(s).
A view service – Provides client applications with a visual representation in image form of geo-referenced data via a standard protocol using portrayal rules. A typical client application will present data set(s) or parts of data set(s) to the user via a map-based graphical user interface.
A download service - gives access to data set(s) or parts of data set(s). The Download Service provides access to spatial objects whether representing discrete or continuous phenomena.
A registry service - may be considered a type of ‘other service’ which provides access to resources describing the data thus allowing correct processing and interpretation. Registry services need to be maintained properly and must have a clear and well-defined governance model. It is important that all registers keep track of all changes so that data created with reference to an outdated register can still be interpreted and that completely superseded or retired register items remain in the register. Every item in the register must be associated with a unique, unambiguous and permanent identifier.
A transformation service – a service for carrying out data content or data structure transformations. This is an auxiliary service type normally connected with a download service. It is designed as a mechanism to enable spatial data sets to be transformed with a view to achieving interoperability. An example would be transforming between coordinate reference systems. A transformation service will usually not be made directly accessible for the general public but a metadata record of it should exist.
An invoke spatial data service – A service invoking the use of spatial data service(s) that allows the definition of data inputs and data outputs expected by the spatial service and defines a workflow or service chain combining multiple services. It also allows the definition of a web service interface managing and accessing (executing) workflows or service chains. The service chains should be expressed in a standard (e.g. XML-based) notation that can be consumed by commercial and open-source orchestration engines from multiple sources. Invoke Spatial Data Services will enable a user or client application to run them without requiring the availability of a GIS.
Metadata can be loosely defined as “data about data”. Discovery metadata should provide information that allows a user to discover the existence of a particular data set, along with key information about its content, location, ownership, how to obtain it and any associated costs.
Mandatory – The field must be completed. If a mandatory element is not completed then the metadata will not be valid.
Conditional – The field must be completed if the element exists for the resource. e.g. If there is a web page where you can access and download a data set you must provide that information in the metadata . However, if there is no web link you don't have to provide one!
Optional – The field may be completed, but it is not required to create a valid metadata record. The core MEDIN metadata elements are all selected because they are considered very useful in discovering the data resource they are describing, so we recommend fill all elements in where possible.
Metadata is valid when it contains all the necessary metadata element s filled out in the correct format. There are tests in the tools provided by MEDIN to check if all the right elements are filled in, but you should also carefully check the content of free text fields to make sure the information is relevant and understandable.
Once the bounding box has been defined, a spatial query is executed on the web server to look at the extent of the bounding box and automatically select the Charting Progress Sea Areas. If the bounding box is large then the query can be slow (several seconds) to execute.
All metadata created using MEDIN approved tools are compliant with the INSPIRE metadata guidelines. MEDIN DACs will provide INSPIRE compliant view and download services for the data they hold, whilst the MEDIN metadata portal will provide the INSPIRE discovery service. The UK Location Programme includes the implementation of the INSPIRE Directive for the UK and MEDIN acts a Metadata and Data Publisher for the marine community within UKLP.
For more information:
If you are having difficulty selecting a keyword from the list we recommend using the SeaDataNet expandable thesaurus to select the relevant keywords. You must drill down at least two levels until you get to the P021 vocabulary.
If you can not find any suitable keywords contact MEDIN metadata support firstname.lastname@example.org for assistance.
MEDIN data guidelines are data archive standards and provide guidance on what information should be stored alongside your data to ensure they can be reused in the future. It is likely that you are already recording most of the information requested in a MEDIN data guideline so it should not take you long to make your data MEDIN compliant.
An independent pilot study on using the MEDIN data guidelines estimated that an additional 1-3.5 hours would be required per dataset to ensure that all relevant information is recorded in a MEDIN data guideline. Additional instruments and stations will of course increase the time required.
If the data are available online then the metadata element “Web link” will provide a link. If the link is not available, or does not link directly to the resource, then contact the data custodian or distributor listed in the contacts.
If there are any costs related to accessing the data they will be held in the 'Conditions for access and use constraints' element.
The data owner is the organisation or person who owns the Intellectual Property Rights for the data, (or in simpler terms; the organisation or person who paid for the data or service to be created). e.g. company X pays for an ecological assessment to be carried out by a consultant, the consultant contracts a specialist to collect part of this data. Company X is the data owner for all part of the data.
The custodian is the organisation or person who looks after the resource, this is especially relevant for historic datasets where the data owner no longer exists. e.g. records of a deceased naturalist, donated to a museum or archive. Whilst the naturalist remains the data owner, the custodian handles any data enquiries and data permissions issues.
The distributor is the organisation or person who distributes a copy of the resource. The data owner(s) or custodian may be a distributor of the resource, but they may also allow a copy of the resource to be distributed by an archive or other data manager. e.g. Records may be lodged with a Data Archive Centre and available from the originator.
As with any new process, you will need to spend time training and familiarising yourself with the MEDIN metadata standard and the tools that MEDIN supply to create MEDIN discovery metadata. MEDIN offer free, 1-day workshops to help with this familiarisation process. You should allow between 3-10 days as an initial outlay to become familiar with MEDIN metadata standards and tools.
If you already create some form of metadata (which could take 30 minutes), you should allow 10-15 minutes extra to make a MEDIN compliant metadata record. Remember you only need one metadata record per dataset.
If you find it is taking you longer than this, please contact the MEDIN metadata helpdesk on Tel: +44 (0)1752 426237 or email the helpline.
The error messages generated by the MEDIN online tool can be difficult to understand. If you are having difficulty identifying problems with a metadata record you can try pasting part of the error message into the search box.Search Error Message Database
This should help identify which information is missing from your record. If you continue to have difficulty please contact the MEDIN helpdesk
Tel: +44 (0)1752 426237 or email the: helpline.