• 2016-12-02
  • Article

Metadata is a very broad concept: “All data is metadata” and “All metadata is data” are two well-known assertions about metadata indicating the unlimited scope of this topic if used without any restriction. For this cluster page we start from the notion of the digital object as being defined by the Data Foundation & Terminology WG. Metadata is stated to contain descriptive, contextual and provenance assertions about the properties of a digital object. Many people have dealt with the question of what types of metadata can be identified and what functions do they have when taking the digital object as the pivot point. A typical classification comes from NISO: Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, author, subjects, keywords, publisher. Structural metadata describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, including file type, or when and how the file was created. But experts also think of metadata describing rights, provenance, preservation, context and many others.

In this topic description we will not enter this discussion and classification but indicate that all these types of metadata are important to manage, search, access, interpret and re-use digital objects in the broader sense. PIDs and metadata form a global virtualisation layer which we can work with as long as we do not want to work on the bit sequences referred to by a digital object. Metadata in this view carries the rich characterisation of the content of a digital object. And of course many scientific operations can be carried out only at the level of metadata.

Therefore, almost all groups in RDA have some aspects of metadata—which is not very helpful for this topic page. We therefore distinguish between those groups for which metadata are in the focus (core), those for which metadata are one aspect amongst many others (others) and those that are organised around domain interests (domain). However all groups working on repository issues are excluded.

Topic Graph

Relevant RDA groups

BioSharing Registry WG The aim of this working group is to produce a searchable registry of linked and reliable resources (funder policies, databases, content standards, journals) for a variety of stakeholders working in the life sciences. These stakeholders – such as researchers, funders, and journals – will be able to select and recommend community endorsed standards, while repository developers will be able to confirm the requirements of their products for discoverability and endorsement.

Brokering Framework WG The Brokering IG/WGs work on specifications for a middleware layer and services that can mediate in circumstances where heterogeneity has to be brought together. Mapping between heterogeneous metadata sets is one of the often occurring challenges.

Brokering Governance WG The Brokering IG/WGs work on specifications for a middleware layer and services that can mediate in circumstances where heterogeneity has to be brought together. Mapping between heterogeneous metadata sets is one of the often occurring challenges.

Data Discovery Paradigms IG Given the increasing number of research data repositories, and the need for cross-disciplinary data discovery, the Data Discovery Paradigms IG wants to identify common elements and shared issues that support users in discovering research data regardless of its location or the manner in which it is stored, described and exposed. These could be a registry of data search engines, common test datasets, usage metrics, and a collection of data search use cases and competency questions. So metadata and its optimal use is in the focus of this group.

Data Fabric IG The Data Fabric IG is focusing on the data creation and consumption circle as it happens daily in the scientific and industrial labs and on the identification of ways to make this work more efficiently and thus more cost-effective. The group’s goal is to identify so-called Common Components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.

Data Foundation & Terminology WG The Data Foundation and Terminology WG task is to describe a basic, abstract data organization model which can be used to derive a reference data terminology that can be used across communities and stakeholders to better synchronize conceptualization, to enable better understanding within and between communities and finally to stimulate tool building, such as for data services, supportive of the basic model’s use. This abstract data organization model will focus on common building blocks and their characteristics, along with relevant protocols.

Data in Context IG Data Objects which includes large collections have been created in a certain context (persons, projects, institutions, etc.). The Data in Context IG wants to work out principles of how to include context knowledge in the description of DOs which could help in re-use and also allow to draw links between various types of entities.

Empirical Humanities Metadata WG Researchers in the empirical humanities often need to collaborate to understand phenomena that operate across geographic regions, scale and communities of people. But established research practices and infrastructures in the empirical humanities do not support this form of broad collaboration. The Empirical Humanities Metadata WG therefore will conduct research, develop a statement of best practices and release an adoptable product centered on what needs to be in place (standards, protocols, policies, cultural expectations) to make ethnographic and historical data archivable, discoverable and shareable. Defining useful metadata categories that support these intentions in this broad field are in the core of this group.

International Materials Resource Registries WG Many labs worldwide carry out experiments and/or run simulations to determine the properties of materials and in particular compound materials. There is a vast number of results and possible material combinations, which causes problems for researchers seeking properties of particular materials. The International Materials Resource Registries WG is working on a joint metadata schema which would help to look for results of interest.

Metadata IG The Metadata IG is discussing a new package based approach to model metadata. The intentions are compliant with the DFT model. The metadata IG kicked off the Metadata Standards Directory WG which created the Metadata Standards Directory as output where everyone should register newly created metadata schemas so that interested experts can make use of what has been already done. The Metadata IG aims on facilitating and coordinating the efforts of all the WGs dealing with metadata. Its activity mostly focuses on data management policies and standards.

Metadata Standards Catalog WG One of the basic steps in this wide domain of metadata is to inform each other about existing schemas so that they could either be re-used or taken as good examples. The Metadata Standards Catalogue/Directory WGs provided first a catalogue of existing metadata schemas which is now being extended to a more flexible solution.

PID Information Types WG The PID Information Types WG recognises that in complex data domains, unique and persistent identifiers (PIDs) associated with specific information are the core of proper data management and access. They can be used to give every data object (including collection objects) an identity that enables referring to the data resources and metadata and, additionally, to prove integrity, authenticity and other attributes. But this requires a PID to be uniquely associated with specific types of information, and those types and their association with PIDs must be well managed. Therefore it is useful to specify a framework for information types, to start agreeing on some essential types, and to define a process by which other types can be integrated.

Practical Policies WG The Practical Policies WG is widely agnostic to concrete data models, since it collects a wide variety of typical data management and analytics workflows that are being executed on collections. It can be used so that it supports the DFT model.

RDA / TDWG Metadata Standards for attribution of physical and digital collections stewardship WG The Metadata Standards for attribution of physical and digital collections stewardship WG recognises that many institutes host large collection of physical objects of different types which they are now partly digitizing or of which they have digital fingerprints. Yet these institutions lack proper standards for giving attribution for the maintenance, curation, and digitization of collections. Based on many use cases from such institutions an attribution metadata schema is going to be created that will help getting credit for curation, maintenance, and digitization of a collection as easy as getting credit for a publication.

Reproducibility IG The Reproducibility IG seeks to advance and enable reproducibility in research based on or producing datasets. Recommendations need to be made to overcome the current situation where too often research results based on data cannot be reproduced. One important pillar amongst others is to develop and/or adopt suitable metadata standards describing data and code that are involved in creating research results and encourage their usage.

Research Data Collections WG The Research Data Collections WG is working on specificities of data collections and their description. This group did not produce results.

Research Data Provenance IG analysing the requirements for provenance metadata relevant for later data re-use

Rice Data Interoperability WG The objective of the Rice Research Data Interoperability Working Group is to provide a framework for community accepted standards to aid data integration and analysis, and bridge the gap in free data sharing in rice research data. The framework will help identify, describe, and link rice data using open standards. The group will also address issues such as the development of a minimal metadata set and selection of appropriate vocabularies. The group will encourage adoption of the outputted framework even within private (for-profit) institutions.

Wheat Data Interoperability WG The Wheat Data Interoperability Working Group seeks to devise a common framework to promote and sustain wheat data sharing, reusability and operability. The framework will use open standards for the identification, description, mapping and publication of wheat data. It will also examine the requirements for a minimal metadata set to describe wheat data, and seek to develop recommendations on topical vocabularies and ontologies. The group aims to produce a 'cookbook' on how to produce easily shareable, reusable and interoperable wheat data.

Outputs of the PID Information Types WG

  • Development of a conceptual model for structuring types which are attributes describing properties of digital objects associated with the PIDs registered in the Data Type Registry
  • Specification of the idea of type profiles associated with PID service providers.
  • Development of a demonstrator including an Application Programming Interface.
  • See: http://dx.doi.org/10.15497/FDAA09D5-5ED0-403D-B97A-2675E1EBE786

Outputs of the BioSharing Registry WG

  • A web-based, searchable portal of three interlinked registries, containing descriptions of standards, databases and data policies.
  • Specifications of how to interlink these different knowledge sources were specified.

Outputs of the Wheat Data Interoperability WG

  • Guidelines have been produced (http://ist.blogs.inra.fr/wdi/) for wheat data descriptions and representations to promote common practices and avoid duplicated effort. Best practices, tools, recommendations and examples are listed for several aspects of wheat data: sequence variations, genome annotations, phenotypes, germplasm, gene expression and physical maps.
  • In addition, a portal for vocabularies and ontologies has been produced (http://wheat.agroportal.lirmm.fr). Vocabularies can be shared, aligned and reused. The aim is to make this a collaborative effort.

Outputs of the Metadata Standards Catalogue/Directory WG

  • The output of the WG is an open online directory that contains the available metadata standards, extensions, tools and use cases organised in topic categories and that can be used to add additional standards and tools based on YAML template markdown files.
  • The online directory is available here: http://rd-alliance.github.io/metadata-directory/
  • It aims on facilitating the identification and re-use of the existing standards among different groups, domains and researchers.
  • A machine readable catalogue of metadata standards will be produced by the Metadata Catalog WG.

Outputs of the Data Fabric IG

Outputs of the Metadata Standards for attribution of physical and digital collections stewardship WG

 No outputs yet

Outputs of the Research Data Collections WG

No outputs yet

Outputs of the Data Foundation and Terminology WG

  • Development of a core data organisation model based on a large number of different use cases with Digital Objects as the central concept to unify the data landscape.
  • Definition of a set of terms that underpin this core data organisation model.
  • Development of an open-to-use Term Tool to register term definitions.
  • Huge awareness raising and terminology harmonisation in many communities about core data issues.
  • See:  dx.doi.org/10.15497/06825049-8CA4-40BD-BCAF-DE9F0EA2FADF
  • Metadata in general is essential for describing digital objects, and the metadata descriptions themselves are also digital objects. The model provides a coherent framework for understanding the role of metadata and its relation to other fundamental concepts.

Outputs of the Practical Policy WG

  • Identification of eleven generic policy areas for operation with data collections stored in repositories and a template-based collection of policy specifications in these areas being collected in a cookbook.
  • Development of code snippets to support policy specifications and making it easy for people to turn to executable procedures.
  • One of the eleven areas is “contextual metadata extraction”, though some of the others are linked to metadata.

Outputs of the Brokering Framework WG

No outputs yet

Outputs of the Brokering Governance WG

  • Based on the concept that brokering middleware (mapping, bridging) should be provided by external services providers adequate business and governance models are required.
  • The Brokering Governance WG has concentrated mainly on business models for sustaining brokering middleware frameworks. It is recognised that although brokering middleware (providing a capability for access and use of data bridging scientific disciplines) is becoming available but little consideration has been given to how to sustain it—the question of who pays.
  • A report has been produced setting out five classes of business models that might be applicable, and proposing a hybrid model as the most promising way forward. This will be of interest to funders of research, data centres and repository managers.

Outputs of the Empirical Humanities Metadata WG

No outputs yet

Outputs of the International Materials Resource Registries WG

No outputs yet

Outputs of the Rice Data Interoperability WG

No outputs yet

Outputs of the Metadata IG

A document on “Metadata Principles” has been made available and endorsed by all the related metadata groups.

Outputs of the Data Discovery Paradigms IG

No outputs are yet available from this IG.

Outputs of the Data in Context IG

No outputs are yet available from this IG

Outputs of the Research Data Provenance IG

No outputs yet