• 2016-12-02
  • Article

Data models, data standards and data types are at the core of proper data organisations, re-use and interoperability. With the term data standards we refer to all the different file formats that are being used in science expecting of course that these are well specified and where possible adhere to widely accepted standards and best practices to enable interpretation . The term data type has been used in computer science for many years and ranges from simple types (integers, etc.) to composite/complex types, MIME types and abstract data types . In the context of RDA the term data types was used to establish a relation between for example a file type and a tool that allows its interpretation or between a concept that can be found in a structured file and its interpretation. Also the term data model has been used in computer science for many years with various flavours . In this context we interpret the term as specifying the way data, collections, different types of metadata (descriptive, provenance, rights, etc.) and persistent identifiers are being organised and how they refer to each other to allow machines to easily find all required information about a digital object.

Differences between and non-explicitness of data standards, types and model are one of the major sources of inefficiencies when working with data in science and industry. The FAIR principles define principles which are also addressing this topic area.

It is widely agreed that in the core of this topic area are Digital Objects and their characteristics: they have bit sequences stored in repositories, they are assigned a persistent identifier and associated with metadata. Digital Objects can be of a wide range of sorts: data, software, system configurations, digital fingerprints of physical objects, etc. Data Objects can originate for example from files or from executing queries on databases.

This cluster page refers to all RDA WGs and IGs that deal in some way with the aspects mentioned above.

Topic Graph

Relevant RDA groups

Array Database Assessment WG The Array Database Assessment WG is working with a completely different data model. They expect that all data and metadata belonging to a certain study is entered into a big array so that one can then work efficiently with all this data and its metadata, define various views, do calculations etc. being assisted by a query language. All data of this type are accessible by exposing Open Geospatial Consortium (OGC) services on top of them, such as the WCS, WMS and WCPS ones

BioSharing Registry WG The aim of this working group is to produce a searchable registry of linked and reliable resources (funder policies, databases, content standards, journals) for a variety of stakeholders working in the life sciences. These stakeholders – such as researchers, funders, and journals – will be able to select and recommend community endorsed standards, while repository developers will be able to confirm the requirements of their products for discoverability and endorsement.

Data Fabric IG The Data Fabric IG is focusing on the data creation and consumption circle as it happens daily in the scientific and industrial labs and on the identification of ways to make this work more efficiently and thus more cost-effective. The group’s goal is to identify so-called Common Components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.

Data Foundation & Terminology WG The Data Foundation and Terminology WG task is to describe a basic, abstract data organization model which can be used to derive a reference data terminology that can be used across communities and stakeholders to better synchronize conceptualization, to enable better understanding within and between communities and finally to stimulate tool building, such as for data services, supportive of the basic model’s use. This abstract data organization model will focus on common building blocks and their characteristics, along with relevant protocols.

Data Type Registries WG The Data Type Registry WG concept is compliant with the Data Foundation and Terminology data model and allows users to define data types which can be a variable found in a Digital Object or the structure of a Digital Object and link them with functions.

Metadata IG The Metadata IG is discussing a new package based approach to model metadata. The intentions are compliant with the DFT model. The metadata IG kicked off the Metadata Standards Directory WG which created the Metadata Standards Directory as output where everyone should register newly created metadata schemas so that interested experts can make use of what has been already done. The Metadata IG aims on facilitating and coordinating the efforts of all the WGs dealing with metadata. Its activity mostly focuses on data management policies and standards.

PID Information Types WG The PID Information Types WG recognises that in complex data domains, unique and persistent identifiers (PIDs) associated with specific information are the core of proper data management and access. They can be used to give every data object (including collection objects) an identity that enables referring to the data resources and metadata and, additionally, to prove integrity, authenticity and other attributes. But this requires a PID to be uniquely associated with specific types of information, and those types and their association with PIDs must be well managed. Therefore it is useful to specify a framework for information types, to start agreeing on some essential types, and to define a process by which other types can be integrated.

Practical Policies WG The Practical Policies WG is widely agnostic to concrete data models, since it collects a wide variety of typical data management and analytics workflows that are being executed on collections. It can be used so that it supports the DFT model.

Research Data Collections WG The Research Data Collections WG is working on specificities of data collections and their description. This group did not produce results.

Rice Data Interoperability WG The objective of the Rice Research Data Interoperability Working Group is to provide a framework for community accepted standards to aid data integration and analysis, and bridge the gap in free data sharing in rice research data. The framework will help identify, describe, and link rice data using open standards. The group will also address issues such as the development of a minimal metadata set and selection of appropriate vocabularies. The group will encourage adoption of the outputted framework even within private (for-profit) institutions.

Wheat Data Interoperability WG The Wheat Data Interoperability Working Group seeks to devise a common framework to promote and sustain wheat data sharing, reusability and operability. The framework will use open standards for the identification, description, mapping and publication of wheat data. It will also examine the requirements for a minimal metadata set to describe wheat data, and seek to develop recommendations on topical vocabularies and ontologies. The group aims to produce a 'cookbook' on how to produce easily shareable, reusable and interoperable wheat data.

Outputs of the PID Information Types WG

  • Development of a conceptual model for structuring types which are attributes describing properties of digital objects associated with the PIDs registered in the Data Type Registry
  • Specification of the idea of type profiles associated with PID service providers.
  • Development of a demonstrator including an Application Programming Interface.
  • See: http://dx.doi.org/10.15497/FDAA09D5-5ED0-403D-B97A-2675E1EBE786

Outputs of the Data Foundation and Terminology WG

  • Development of a core data organisation model based on a large number of different use cases with Digital Objects as the central concept to unify the data landscape.
  • Definition of a set of terms that underpin this core data organisation model.
  • Development of an open-to-use Term Tool to register term definitions.
  • Huge awareness raising and terminology harmonisation in many communities about core data issues.
  • See: http://dx.doi.org/10.15497/06825049-8CA4-40BD-BCAF-DE9F0EA2FADF

Outputs of the BioSharing Registry WG

  • A web-based, searchable portal of three interlinked registries, containing descriptions of standards, databases and data policies.
  • Specifications of how to interlink these different knowledge sources were specified.

Outputs of the Data Type Registries WG

  • Offering a mechanism that allows data producers to classify their data sets in standard data types
  • Specification of a Data Type Registry that relates “types” with useful operations and thus can help in automating processes.
  • Development of a open prototype implementation of the Data Type Registry and tests of it.

Outputs of the Practical Policies WG

  • Identification of eleven generic policy areas for operation with data collections stored in repositories and a template-based collection of policy specifications in these areas being collected in a cookbook.
  • Development of code snippets to support policy specifications and making it easy for people to turn to executable procedures.

Outputs of the Wheat Data Interoperability WG

  • Guidelines have been produced (http://ist.blogs.inra.fr/wdi/) for wheat data descriptions and representations to promote common practices and avoid duplicated effort. Best practices, tools, recommendations and examples are listed for several aspects of wheat data: sequence variations, genome annotations, phenotypes, germplasm, gene expression and physical maps.
  • In addition, a portal for vocabularies and ontologies has been produced (http://wheat.agroportal.lirmm.fr). Vocabularies can be shared, aligned and reused. The aim is to make this a collaborative effort.

Outputs of the Data Fabric IG

Outputs of the Research Data Collections WG

No outputs yet

Outputs of the Rice Data Interoperability WG

No outputs yet

Outputs of the Metadata IG

A document on “Metadata Principles” has been made available and endorsed by all the related metadata groups.

Outputs of the Array Database Assessment WG

This WG has not yet produced results.