• 2016-12-02
  • Article

Trustworthy repositories and proper data management are at the heart of the emerging research and data infrastructures. Trustworthy repositories are those entities in our infrastructures that reliably and persistently store, manage and give access to digital objects, i.e. data, metadata, collections, databases, etc. Increasingly often data is offered in form of services, i.e. repositories can maintain special software that offers specific access opportunities (visualisation, selection, some calculations, etc.). Data management describes all those activities that are required within repositories that ensure that data will be organised properly and that it can be maintained and curated over long periods, so that users are guaranteed persistent access and re-usability.

This cluster refers to all RDA WGs and IGs that deal in some way with repositories and data management which will be crucial for the way we will work with data in the future. Data management is here meant to include the stewardship of data after the end of projects. Since there are so many groups dealing with these two topics we split it into two clusters although there are clear relationships.

Topic Graph

Relevant RDA groups

Active Data Management Plans IG

Researchers are being required to provide Data Management Plans (DMP) for project proposals to indicate that data management and stewardship are taken seriously and thus data will be accessible and reusable. However during the lifetime of a project, change can occur to the project data plan for various reasons. DMPs often remain static and do not reflect these changes, and as a result are of limited value.the lifetime of projects data plans, however, change for various reasons and the current DMPs are static and do not reflect these changes, thus they only have a limited value. Based on an analysis of current practices the Active Data Management Plan (ADMP) IG is addressing this gap and is working on major topics:

  • identifying the requirements for ADMP covering lifecycles of data and changes within projects,
  • specifying practical tools and services to create active data management plans and making them actionable,
  • specify interfaces and exchange formats for ADMP supporting tools

Array Database Assessment WG The Array Database Assessment WG is working with a completely different data model. They expect that all data and metadata belonging to a certain study is entered into a big array so that one can then work efficiently with all this data and its metadata, define various views, do calculations etc. being assisted by a query language. All data of this type are accessible by exposing Open Geospatial Consortium (OGC) services on top of them, such as the WCS, WMS and WCPS ones

BioSharing Registry WG The aim of this working group is to produce a searchable registry of linked and reliable resources (funder policies, databases, content standards, journals) for a variety of stakeholders working in the life sciences. These stakeholders – such as researchers, funders, and journals – will be able to select and recommend community endorsed standards, while repository developers will be able to confirm the requirements of their products for discoverability and endorsement.

Data Citation WG The RDA Working Group on Data Citation (WG-DC) aims to bring together agroup of experts to discuss the issues, requirements, advantages and shortcomings of existing approaches for efficiently citing subsets of data. The WG-DC focuses on a narrow field where we can contribute significantly and provide prototypes and reference implementations.

Data Fabric IG The Data Fabric IG is focusing on the data creation and consumption circle as it happens daily in the scientific and industrial labs and on the identification of ways to make this work more efficiently and thus more cost-effective. The group’s goal is to identify so-called Common Components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.

Data Foundation & Terminology WG The Data Foundation and Terminology WG task is to describe a basic, abstract data organization model which can be used to derive a reference data terminology that can be used across communities and stakeholders to better synchronize conceptualization, to enable better understanding within and between communities and finally to stimulate tool building, such as for data services, supportive of the basic model’s use. This abstract data organization model will focus on common building blocks and their characteristics, along with relevant protocols.

Data Rescue IG The Data Rescue IG deals with the phenomenon that we are continuously losing data due to various reasons. With proper data management repositories will play a big role in rescuing data that is deemed to be lost.

Domain Repositories IG The Domain Repositories IG addresses the issue that there is an enormous number of smaller repositories in various scientific domains worldwide that have useful data. The question here is how to make these repositories and their data visible and accessible and how experiences about such repositories can be exchanged to optimise their practices.

Libraries for Research Data IG

Libraries have expanded on their traditional roles and developed new services in the digital environment, not just facilitating but becoming active participants in the research process. These services include providing access and preservation of research data, as well as advising and supporting researchers in the management of research data.

Libraries have a successful history in collaboration and interoperable solutions, something that is increasingly vital in an environment of evolving software and data management products, mobile researchers, and volatile repositories. Maintaining continued long term access to scholarly assets is essential, and RDA offers a venue for librarians to share their skill sets and expertise in this regard with members of other groups such as Domain Repositories Interest Group, the Metadata Working Group, and the Data Publishing Interest Group. Librarians in turn can receive best practice developed in other fields and bring this back to the library community. It also offers the opportunity to share the principles, and practices of librarians experienced in the stewardship of data, with domain specific groups seeking to develop local solutions to often universal problems within data management.

The objectives of the Libraries for Research Data Interest Group include development of strategies to embed data management services at academic and research institutions, identification of sustainable organisational business models for libraries in support of RDM, and the promotion of best practice and interoperability of library infrastructures with domain repositories and other RDM initiatives. Working groups will be formed with reference to specific, short term activities identified by the Interest Group.

Metadata IG The Metadata IG is discussing a new package based approach to model metadata. The intentions are compliant with the DFT model. The metadata IG kicked off the Metadata Standards Directory WG which created the Metadata Standards Directory as output where everyone should register newly created metadata schemas so that interested experts can make use of what has been already done. The Metadata IG aims on facilitating and coordinating the efforts of all the WGs dealing with metadata. Its activity mostly focuses on data management policies and standards.

Practical Policies WG The Practical Policies WG is widely agnostic to concrete data models, since it collects a wide variety of typical data management and analytics workflows that are being executed on collections. It can be used so that it supports the DFT model.

RDA / WDS Publishing Data Workflows WG The RDA/WDS Publishing Data Workflows WG aims to provide an analysis of a representative range of existing and emerging workflows and standards for data publishing, including deposit and citation, and provide reference models and implementations for application in new workflows.

Repository Audit and Certification DSA–WDS Partnership WG Repositories are and will be key pillars for accessibility and re-usability of digital objects in the emerging global data domain. Therefore, it is important for all involved stakeholders (data creators, users, funders, etc.) to know which repositories are trustworthy to rely on their proper data management capabilities. Two initiatives, Data Seal of Approval and World Data System, worked in parallel on comparable sets of criteria that allows assessing the quality of the policies and procedures followed by a repository. Under the umbrella of RDA both initiatives joined forces with the objective of developing a common framework for the certification of trustworthy repositories to harmonise the approaches and give clear signals to the stakeholder communities worldwide about the need to assess the quality of repositories and use a joint approach globally recognised.

Repository Platforms for Research Data IG specifying requirements for proper repository software supporting data stewardship

Research Data Collections WG The Research Data Collections WG is working on specificities of data collections and their description. This group did not produce results.

Research Data Repository Interoperability WG The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms. These standards may include (but are not limited to) a generic API and import/export formats.

Outputs of the Data Foundation and Terminology WG

  • Development of a core data organisation model based on a large number of different use cases with Digital Objects as the central concept to unify the data landscape.
  • Definition of a set of terms that underpin this core data organisation model.
  • Development of an open-to-use Term Tool to register term definitions.
  • Huge awareness raising and terminology harmonisation in many communities about core data issues.
  • See: http://dx.doi.org/10.15497/06825049-8CA4-40BD-BCAF-DE9F0EA2FADF

Outputs of the Data Citation WG

  • A set of rules to guarantee proper data citation even in the case of dynamic data.
  • This set includes rules for Versioning to ensue that earlier states of data sets can be retrieved, for Time Stamping to mark any operations performed on data sets, and Data Identification by using PIDs even in case of queries.

Outputs of the BioSharing Registry WG

  • A web-based, searchable portal of three interlinked registries, containing descriptions of standards, databases and data policies.
  • Specifications of how to interlink these different knowledge sources were specified.

Outputs of the Publishing Data Workflows WG

Outputs of the Repository Audit and Certification WG

The working group finished its work by producing three documents:

  • a set of harmonized Common Procedures for the certification of repositories to support the implementation of a catalogue of common requirements
  • a catalogue of common requirements merging the DSA and WDS approaches to have one joint basis for certification
  • In addition, the group released a report on the testbed they created to evaluate the procedures and requirements.

DSA–WDS Partnership – Procedures for Core Certification V1.2.pdf



Outputs of the Practical Policies WG

  • Identification of eleven generic policy areas for operation with data collections stored in repositories and a template-based collection of policy specifications in these areas being collected in a cookbook.
  • Development of code snippets to support policy specifications and making it easy for people to turn to executable procedures.

Outputs of the Data Fabric IG

Outputs of the Research Data Collections WG

No outputs yet

Outputs of the Metadata IG

A document on “Metadata Principles” has been made available and endorsed by all the related metadata groups.

Outputs of the Array Database Assessment WG

This WG has not yet produced results.

Outputs of the Research Data Repository Interoperability WG

No outputs have been produced yet.

Outputs of the Domain Repositories

No outputs have been produced yet.

Outputs of the Repository Platforms for Research Data WG

A matrix describing the functional requirements for research data repository platforms has been produced.

Outputs of the Active Data Management Plans IG

This WG has not yet produced results

Outputs of the Data Rescue IG

This WG has not yet produced results.