Interoperability across repositories requires standardized approaches to metadata.
“The real value of repositories lies in the potential to interconnect them to create a network of repositories, a network that can provide unified access to research outputs and be (re-) used by machines and researchers. In order to achieve this potential, we need interoperability.
In addition to interoperability across repositories, we also want to ensure that repositories and other related systems are interoperable with each other, such as repositories with CRIS (Current Research Information Systems).” https://coartraining.gitbook.io/coar-repository-toolkit/interoperability
“Ideally, repositories will expose their metadata using common schema and vocabularies so that the records can be standardized, and aggregated by repository networks. In turn, these networks can develop more useful services with the metadata, such as tracking open access, discovery of content, and analytics.
Currently most repositories expose their metadata through the Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH). This protocol allows the repository to use a variety of metadata profiles, in addition to the simple OAI-DC metadata format based on Dublin Core (DC). For generic data repositories, the DataCite metadata schema is the most widely used. Domain-based metadata schemas may also be used by repositories that specialize in collecting content from a specific discipline.
In addition, there are regional guidelines for repositories defined by certain repository networks, such as LA Referencia (Latin America) and OpenAIRE (Europe) require the adoption of certain specific metadata elements and vocabularies in order to provide services based on the metadata they aggregate.” https://coartraining.gitbook.io/coar-repository-toolkit/interoperability/metadata-and-vocabularies
This document outlines suggested metadata guidelines for African repositories. It also recommends a use of controlled vocabularies - “lists of standardized terminology, words, or phrases, used for indexing or content analysis and information retrieval, usually in a defined information domain." (CASRAI).
Comprehensive metadata: Aim for as comprehensive metadata as possible. Try to include all descriptive information provided in the resource that you upload to your repository.
Title (dc.title) - the original wording, order and spelling of the resource title. Capitalize proper nouns only. [Punctuation need not reflect the usage of the original. Subtitles should be separated from the title by a colon. This instruction would result in Title:Subtitle (i.e. no space). https://guidelines.openaire.eu/en/latest/literature/field_title.html]
Title in English, if different, in a separate field.
Author(s) (dc.contributor.author) - each author in a separate field. Use inverted name, so the syntax will be the following: “surname”, “initials” (“first name”) “prefix”. For example Jan Hubert de Smit becomes <dc:creator>Smit, J.H. (John) de</dc:creator> . Use a standardised writing style for names, e.g. the writing style used by the publisher when this is available. If not, use the encoding of the APA bibliographic writing style as in a reference list when applicable. Generational suffixes (Jr., Sr., etc.) should follow the surname. When in doubt, give the name as it appears, and do not invert. Omit titles (like “Dr”). For example: “Dr. John H. de Smit Jr.” becomes <dc:creator>Smit Jr., J.H. (John) de</dc:creator>
Advisor(s) (dc.contributor.advisor) - for example, thesis supervisor, which can be added when uploading bachelor thesis, doctoral thesis and master thesis.
Abstract in English, if different, in a separate field.
Date (dc.date.issued) - recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. In DSpace you could mention the year only for journal articles.
Digital Object Identifier (dc.identifier or dc.identifier.doi or dc.identifier.other), e.g. 10.1186/s13027-017-0170-5 or http://doi.org/10.1007/s12374-017-0088-x
Keywords (dc.subject) - each keyword in a separate field.
Language (dc.language.iso) in ISO 639 standard (2 or 3 letter code, e.g. en or eng for English).
Journal title / Conference title (dc.publisher) for journal articles / conference proceedings.
Journal volume and number (dc.relation.is part of series or dc.citation.issue, dc.citation.spage, dc.citation.epage).
Journal ISSN (dc.identifier.issn) / Book ISBN
Type (dc.type) - publication type. Indicate the type of publication based on a local repository vocabulary or use the info:eu-repo publication type vocabulary or COAR Resource type vocabulary to indicate the type of your resource (Appendix 1).
Access (dc.rights) - provide access information (e.g. Open Access). Use COAR Access Rights Vocabulary to indicate access rights to your resource
-- metadata only access or restricted Access as recommended in OpenAIRE Guidelines for Literature Repositories v3
Information about re-use - for materials published under Creative Commons licence in the dc.rights or dc.rights.license field mention the license, for example Creative Commons Attribution 4.0 International, and in dc.rights.uri - the licence URL, e.g. http://creativecommons.org/licenses/by/4.0/ For those who would like to include more information about the licence conditions, see more information here: https://guidelines.openaire.eu/en/latest/literature/field_licensecondition.html
Citation (dc.identifier.citation) - suggested citation of an item (e.g. journal's name, volume and issue for a journal article); these details allow a better retrieval of your documents.
Good practice examples: additional information and metadata
ORCID - add an ORCID iD to author names. Promote the adoption of ORCID iDs to uniquely identify authors (even in case of name ambiguity). Encourage authors to register with ORCID in order to obtain an ORCID iD. In Dublin Core ORCID iDs should be provided directly as a part of the author's name (e.g. <dc:author>Summan, Friedrich (ORCID-ID 0000-0002-6297-3348)</dc:author>).
Description - add additional description, if needed, in dc.description. For example, provide more details about a thesis/dissertation: “A Research dissertation submitted to the School of Public Administration and Management for the requirement to undertake the field study (in Semester 3) for the fulfillment of the Master Degree in Public Administration (MPA) of Mzumbe University” (from http://scholar.mzumbe.ac.tz/handle/11192.1/2408).
Project information - add grant/project information, when applicable in dc.relation if a resource was supported by a project/grant. For example, you could see all publications resulting from the projects funded by the Ministry of Education, Science and Technological Development of Republic of Serbia: Projects: Spatial, environmental, energy and social aspects of developing settlements and climate change - mutual impacts (RS-36035) (info:eu-repo/grantAgreement/MESTD/Technological Development (TD or TR)/36035/RS//).
An authoritative list of projects is exposed by OpenAIRE through OAI-PMH, and available for all repository managers. Values include the project name and project ID. The projectID equals the Grant Agreement identifier, and is defined by the info:eu-repo namespace term grant Agreement. The three-part namespace is mandatory when applicable ( info:eu-repo/grantAgreement/Funder/FundingProgram/ProjectID ), while the six-parts namespace is recommended. https://guidelines.openaire.eu/en/latest/literature/field_projectid.html
Publication Version - when applicable, indicate the status of the resource in the publication process / the version of the article in dc.type.version - for example, publishedVersion. Use the following controlled vocabulary for the version of the scientific output based on the DRIVER-version info:eu-repo version terms.
Format (dc.format) - the physical or digital manifestation of the resource. Typically, format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats). Based on best practice, the IANA registered list of Internet Media Types (MIME types) is used to select a term from. For the full list see http://www.iana.org/assignments/media-types.
If a specific resource has more than one physical formats (e.g. postscript and pdf) stored as different object files, all formats are mentioned in the DC element format, for example:
Do not confuse with publication type and resource identifier.
Embargo end date (dc.date) - when access is set to embargoed Access the end date of the embargo period must be provided. The corresponding term is defined by info:eu-repo/date/embargoEnd/<YYYY-MM-DD>. Encoding of this date should be in the form YYYY-MM-DD conforming to ISO 8601.
(Based on BASE Golden rule for repository managers: https://www.base-search.net/about/en/faq_oai.php and OpenAIRE Guidelines for Literature Repositories v3)
OpenAIRE recommends to use the info:eu-repo publication type vocabulary:
More comprehensive COAR Resource Type Vocabulary is also available and could be used. https://coartraining.gitbook.io/coar-repository-toolkit/interoperability/controlled-vocabularies
- -- text
- -- conference object
- -- periodical
- -- journal
- -- report
For your comments and suggestions: Metadata guidelines on Google Docs.