• 2016-12-02
  • Article

Data publication and citation are key aspects of the Open Science movement, and core to the RDA’s vision of researchers and innovators openly sharing data across technologies. The publication of datasets and/or the metadata describing them ensures that research can be shared and reused, and allowing citation or reference to data further enhances the opportunities for reuse while supporting the reproducibility of research.

In 2013, G8 leaders signed the G8 Open Data Charter, which states that data should be Open by Default; Timely and Comprehensive; Accessible and Useable; Comparable and Interoperable; For Improved Governance and Citizen Engagement; and For Inclusive Development and Innovation. The FAIR Data Principles also provide high level guidance which aims to make data Findable, Accessible, Interoperable, and Re-usable by conforming to a series of criteria, for example using rich metadata descriptions, employing standard vocabularies and meeting domain/community standards.

Tim Berners-Lee suggests a 5 Star deployment scheme for openly published data , awarding five stars to data which is published online with an open licence, structured, non-proprietary, associated with URIs (Uniform Resource Identifiers) and linked to other datasets.

Guidance around openly published data or metadata typically encourages the use of URIs or persistent identifiers to allow data to be uniquely and persistently found and accessed on the web. Data citation is based on the availability of such identifiers, allowing researchers to acknowledge their sources in the same manner as established bibliographic citation. As the practice evolves, it is hoped that data citation will directly influence researchers’ impact factors, and that standardised citation formats will come into common use. Force 11 has published a Joint Declaration of Data Citation Principles which includes clauses relating to the specificity and verifiability of data, its interoperability, flexibility and accessibility, and the need to acknowledging the importance of published data.

In 2015 the RDA published a paper ‘Principles for Data Sharing and Re-Use: are they all the same?’ which provides an overview of the most commonly referenced principles, including those from Force 11, the G8 Principles for an Open Data Infrastructure, and the Nairobi Data Sharing Principles. It found that the recommendations in each set of principles were, at their core, compatible with each other.

While high level principles exist around best practices in data publication and citation, RDA Working Groups and Interest Groups are currently working on practical issues and specific challenges associated with policy, workflows, requirements and implementations.

Topic Graph

Relevant RDA groups

Data Citation WG The RDA Working Group on Data Citation (WG-DC) aims to bring together agroup of experts to discuss the issues, requirements, advantages and shortcomings of existing approaches for efficiently citing subsets of data. The WG-DC focuses on a narrow field where we can contribute significantly and provide prototypes and reference implementations.

Data Description Registry Interoperability WG The Data Description Registry Interoperability (DDRI) WG addresses the problem of cross-platform discovery by connecting datasets together on the basis of co-authorship or other collaboration models such as joint funding and grants. The suggested solution compiles simple enabling infrastructures based on existing open protocols and standards with a flexible and extensible approach that allows registries to opt-in and enables any third-party to create particular global views of research data.

Data Fabric IG The Data Fabric IG is focusing on the data creation and consumption circle as it happens daily in the scientific and industrial labs and on the identification of ways to make this work more efficiently and thus more cost-effective. The group’s goal is to identify so-called Common Components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.

Data Foundation & Terminology WG The Data Foundation and Terminology WG task is to describe a basic, abstract data organization model which can be used to derive a reference data terminology that can be used across communities and stakeholders to better synchronize conceptualization, to enable better understanding within and between communities and finally to stimulate tool building, such as for data services, supportive of the basic model’s use. This abstract data organization model will focus on common building blocks and their characteristics, along with relevant protocols.

Libraries for Research Data IG

Libraries have expanded on their traditional roles and developed new services in the digital environment, not just facilitating but becoming active participants in the research process. These services include providing access and preservation of research data, as well as advising and supporting researchers in the management of research data.

Libraries have a successful history in collaboration and interoperable solutions, something that is increasingly vital in an environment of evolving software and data management products, mobile researchers, and volatile repositories. Maintaining continued long term access to scholarly assets is essential, and RDA offers a venue for librarians to share their skill sets and expertise in this regard with members of other groups such as Domain Repositories Interest Group, the Metadata Working Group, and the Data Publishing Interest Group. Librarians in turn can receive best practice developed in other fields and bring this back to the library community. It also offers the opportunity to share the principles, and practices of librarians experienced in the stewardship of data, with domain specific groups seeking to develop local solutions to often universal problems within data management.

The objectives of the Libraries for Research Data Interest Group include development of strategies to embed data management services at academic and research institutions, identification of sustainable organisational business models for libraries in support of RDM, and the promotion of best practice and interoperability of library infrastructures with domain repositories and other RDM initiatives. Working groups will be formed with reference to specific, short term activities identified by the Interest Group.

PID Information Types WG The PID Information Types WG recognises that in complex data domains, unique and persistent identifiers (PIDs) associated with specific information are the core of proper data management and access. They can be used to give every data object (including collection objects) an identity that enables referring to the data resources and metadata and, additionally, to prove integrity, authenticity and other attributes. But this requires a PID to be uniquely associated with specific types of information, and those types and their association with PIDs must be well managed. Therefore it is useful to specify a framework for information types, to start agreeing on some essential types, and to define a process by which other types can be integrated.

RDA / WDS Publishing Data Services WG The RDA/WDS Publishing Data Services WG builds on pre-existing components and international initiatives, and is focusing on a standardised cross-referencing service for the links between data and publications.

RDA / WDS Publishing Data Workflows WG The RDA/WDS Publishing Data Workflows WG aims to provide an analysis of a representative range of existing and emerging workflows and standards for data publishing, including deposit and citation, and provide reference models and implementations for application in new workflows.

RDA/CODATA Legal Interoperability IG The RDA/CODATA Legal Interoperability IG has multiple aims around legal interoperability, including Defining legal interoperability of research data and articulate why it is important for data interoperability and reuse and developing and publishing core principles and guidelines of best practices through which legal interoperability can be achieved.

RDA/WDS Publishing Data Bibliometrics WG Bibliometric indicators are essential to obtain quantitative measures for the assessment of the quality of research and researchers and the impact of research products. Systems and services such as the ISI’s Science Citation Index, the h-index (or Hirsch number), or the impact factor of scientific journals have been developed to track and record access and citation of scientific publications. These indicators are widely used by investigators, academic departments and administration, funding agencies, and professional societies across all disciplines to assess performance of individuals or organizations within the research endeavour, and inform and influence the advancement of academic careers and investments of research funding, and thus play a powerful role in the overall scientific endeavour. The overall objective of this working group therefore is to conceptualize data metrics and corresponding services that are suitable to overcome existing barriers and thus likely to initiate a cultural change among scientists, encouraging more and better data citations, augmenting the overall availability and quality of research data, increasing data discoverability and reuse, and facilitating the reproducibility of research results.

RDA/WDS Publishing Data Cost Recovery for Data Centres IG The RDA/WDS Publishing Data Cost Recovery for Data Centres IG is trying to minimise the imbalance between the capacities and functionality of existing data centres and data repositories and the global production of scientific data. It aims to develop cost estimates and elaborate a business model to compensate for additional costs of publishing data in an open access environment, by understanding of the current and possible future cost recovery strategies for data centres.

Outputs of the PID Information Types WG

  • Development of a conceptual model for structuring types which are attributes describing properties of digital objects associated with the PIDs registered in the Data Type Registry
  • Specification of the idea of type profiles associated with PID service providers.
  • Development of a demonstrator including an Application Programming Interface.
  • See: http://dx.doi.org/10.15497/FDAA09D5-5ED0-403D-B97A-2675E1EBE786

Outputs of the Data Foundation and Terminology WG

  • Development of a core data organisation model based on a large number of different use cases with Digital Objects as the central concept to unify the data landscape.
  • Definition of a set of terms that underpin this core data organisation model.
  • Development of an open-to-use Term Tool to register term definitions.
  • Huge awareness raising and terminology harmonisation in many communities about core data issues.
  • See: http://dx.doi.org/10.15497/06825049-8CA4-40BD-BCAF-DE9F0EA2FADF

Outputs of the Data Citation WG

  • A set of rules to guarantee proper data citation even in the case of dynamic data.
  • This set includes rules for Versioning to ensure that earlier states of data sets can be retrieved, for Time Stamping to mark any operations performed on data sets and data identification by using by using PIDs even in case of queries.
  • See https://rd-alliance.org/system/files/RDA-DC-Recommendations_151020.pdf

Outputs of the Publishing Data Services WG

The “Data Literature Interlinking Service” (DLI, see http://dliservice.research-infrastructures.eu) is a (prototype) operational system where prospective users can inspect this collection through a portal, download the corpus through OAI-PMH, or connect with APIs that can be used to query the system automatically. Organizations that are interested in adopting the system are welcome to use these existing interfaces, or reach out to the WG to discuss any specific needs or requirements they might have.

Outputs of the Publishing Data Workflows WG

  • A data publishing reference model comprised of generic components for data publishing workflows, including basic self-publishing services, institutional data repositories, long term projects, curated data repositories, and joint data journal and repository arrangements, available at http://dx.doi.org/10.5281/zenodo.34542

Outputs of the Data Fabric IG

Outputs of the RDA/WDS Publishing Cost Recovery for Data Centres IG

A document has been published about the “Income Streams for data repositories”: https://www.rd-alliance.org/system/files/documents/Income_Streams_for_Data_Repositories-FINAL-160210.pdf

Outputs of the Publishing Data Bibliometrics WG

The Working Group has not yet published any outputs.

Outputs of the Data Description Registry Interoperability WG

The Working Group has not yet published any outputs.

Outputs of the RDA/CODATA Legal Interoperability WG

The Working Group has not yet published any outputs.