At the core of the ability to use and reuse digital objects, including datasets, is the ability to identify and address them. Addressing data comes with its own set of issues, including , what constitutes a discrete dataset and how to deal with data being moved (physically or virtually). The currently accepted solution and state of the art is to use persistent identifiers to point to data.

Although the concept of a persistent identifier seems straightforward, it requires a reliable supporting infrastructure in order to be of any real value. For example, DataCite is an established non-profit organisation that provides persistent identifiers in the form of DOIs for research data. The benefits of being able to point to datasets in such a way include better repeatability of science, transparency, and persistence of results. Furthermore, Persistent Identifiers enable the reliable and direct citation of data in scientific publications.

Many challenges remain in the development and application of persistent identifiers, and RDA is active in these areas.

Relevant RDA groups

Data Citation WG The RDA Working Group on Data Citation (WG-DC) aims to bring together agroup of experts to discuss the issues, requirements, advantages and shortcomings of existing approaches for efficiently citing subsets of data. The WG-DC focuses on a narrow field where we can contribute significantly and provide prototypes and reference implementations.

Data Fabric IG The Data Fabric IG is focusing on the data creation and consumption circle as it happens daily in the scientific and industrial labs and on the identification of ways to make this work more efficiently and thus more cost-effective. The group’s goal is to identify so-called Common Components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.

Data Foundation & Terminology WG The Data Foundation and Terminology WG task is to describe a basic, abstract data organization model which can be used to derive a reference data terminology that can be used across communities and stakeholders to better synchronize conceptualization, to enable better understanding within and between communities and finally to stimulate tool building, such as for data services, supportive of the basic model’s use. This abstract data organization model will focus on common building blocks and their characteristics, along with relevant protocols.

Data Type Registries WG The Data Type Registry WG concept is compliant with the Data Foundation and Terminology data model and allows users to define data types which can be a variable found in a Digital Object or the structure of a Digital Object and link them with functions.

PID Information Types WG The PID Information Types WG recognises that in complex data domains, unique and persistent identifiers (PIDs) associated with specific information are the core of proper data management and access. They can be used to give every data object (including collection objects) an identity that enables referring to the data resources and metadata and, additionally, to prove integrity, authenticity and other attributes. But this requires a PID to be uniquely associated with specific types of information, and those types and their association with PIDs must be well managed. Therefore it is useful to specify a framework for information types, to start agreeing on some essential types, and to define a process by which other types can be integrated.

Research Data Collections WG The Research Data Collections WG is working on specificities of data collections and their description. This group did not produce results.

Outputs of the PID Information Types WG

  • Development of a conceptual model for structuring types which are attributes describing properties of digital objects associated with the PIDs registered in the Data Type Registry
  • Specification of the idea of type profiles associated with PID service providers.
  • Development of a demonstrator including an Application Programming Interface.
  • See: http://dx.doi.org/10.15497/FDAA09D5-5ED0-403D-B97A-2675E1EBE786

Outputs of the Data Type Registry WG

  • Offering a mechanism that allows data producers to classify their data sets in standard data types.
  • Specification of a Data Type Registry that relates “types” with useful operations and thus can help in automating processes.
  • Development of an open prototype implementation of the Data Type Registry and tests of it.

Outputs of the Data Citation WG

  • A set of rules to guarantee proper data citation even in the case of dynamic data.
  • This set includes rules for Versioning to ensue that earlier states of data sets can be retrieved, for Time Stamping to mark any operations performed on data sets, and Data Identification by using PIDs even in case of queries.

Outputs of the Data Fabric IG

Outputs of the Research Data Collections WG

Outputs of the Data Foundation and Terminology WG

  • Development of a core data organisation model based on a large number of different use cases with Digital Objects as the central concept to unify the data landscape.
  • Definition of a set of terms that underpin this core data organisation model. These terms are: Digital Object (DO), Persistent Identifier (PID), PID Record, PID Resolver (aka Resolution System), Metadata, Aggregation, Digital Collection, Digital Entity, Repository, Bitstream, State Information, Property, Metadata Repository, Checksum.
  • Development of an open-to-use Term Tool to register term definitions.
  • Huge awareness raising and terminology harmonisation in many communities about core data issues.
  • See: https://dx.doi.org/10.15497/06825049-8CA4-40BD-BCAF-DE9F0EA2FADF