Data publication and citation are key aspects of the Open Science movement, and core to the RDA’s vision of researchers and innovators openly sharing data across technologies. The publication of datasets and/or the metadata describing them ensures that research can be shared and reused, and allowing citation or reference to data further enhances the opportunities for reuse while supporting the reproducibility of research.
At the core of the ability to use and reuse digital objects, including datasets, is the ability to identify and address them. Addressing data comes with its own set of issues, including , what constitutes a discrete dataset and how to deal with data being moved (physically or virtually). The currently accepted solution and state of the art is to use persistent identifiers to point to data.
The RDA is founded on the ideal of data sharing, yet sharing cannot be completely unrestricted. Many stakeholders have an interest in controlling or restricting access to some digital material, and not only for selfish motives but for perfectly valid ethical reasons. The enforcement of restrictions might be underpinned by legislation, or by policies and practices of particular organisations individually or working in concert. In any case the aim is to prevent unauthorised access to digital material that might cause harm of some kind, whether to national security, the lives of individuals, or commercial or scientific interests.
Trustworthy repositories and proper data management are at the heart of the emerging research and data infrastructures. Trustworthy repositories are those entities in our infrastructures that reliably and persistently store, manage and give access to digital objects, i.e. data, metadata, collections, databases, etc. Increasingly often data is offered in form of services, i.e. repositories can maintain special software that offers specific access opportunities (visualisation, selection, some calculations, etc.).
Data models, data standards and data types are at the core of proper data organisations, re-use and interoperability. With the term data standards we refer to all the different file formats that are being used in science expecting of course that these are well specified and where possible adhere to widely accepted standards and best practices to enable interpretation
Metadata is a very broad concept: “All data is metadata” and “All metadata is data” are two well-known assertions about metadata indicating the unlimited scope of this topic if used without any restriction. For this cluster page we start from the notion of the digital object as being defined by the Data Foundation & Terminology WG. Metadata is stated to contain descriptive, contextual and provenance assertions about the properties of a digital object.
The term Data Stewardship refers to policies, procedures and roles in managing data throughout the whole life cycle of data objects. Therefore, the term stresses the need for active data management after the end of the project that created data, metadata, collections, databases, etc. During project lifetimes, most administrational roles such as owners, copyright holders, data managers, primary users, metadata creators etc. have been defined through the project setup or have been implicitly assumed. At the end of a project most, if not all of these roles - including rights and licensing issues – are no longer well defined.
The term domain science refers to a variety of scientific disciplines and production industries, primarily related to the life sciences (e.g. agriculture, fisheries). Scientists and practitioners in these fields increasingly deal with large volumes of data produced as an output of their working and research practices, such as crop yields and lab data. While many localised networks have sprung up to share and analyse data within specific fields, there are still impediments to sharing and reusing data across networks. The groups listed under this topic tend to share a common goal of producing open, standardised frameworks for the description of data within their field or industry, as well as developing minimal metadata sets and community agreed vocabularies and ontologies.
Research datasets are seldom used in their raw form or in isolation, even by the researchers who captured them for a particular experiment. At minimum a raw dataset will need some processing, perhaps calibration or data-cleansing, before it can be analysed. In general, processed datasets will be created from raw data, and in many fields data will be combined with other datasets that have originated elsewhere. This all gives rise to a number of issues under the heading of data workflows and provenance.
GESIS – Leibniz-Institute for the Social Sciences is the largest infrastructure institution for the Social Sciences in Germany. With da|ra, GESIS provides access to the data referencing system for social science research data.
All data enthusiasts: data scientists and practitioners, students, researchers, representatives from both private and public sector, that work to improve the aquaculture, fisheries, marine and environmental management, or develop ICT solutions for environmental and socio economics were invited to join the datathon co-organised by the RDA-Europe and BlueBRIDGE from June 15 to 16, 2017, in Heraklion, Crete, Greece.
The “Innovating the wheat community through the RDA services and outputs” Datathon took place at the Athena Research Center in Athens on the 13th and 14th July 2016. The event focused on getting researchers and other stakeholders acquainted with the data formats, services, protocols and needs of the wheat community. The first day featured presentations about the various services the wheat group is using and got hold on sample data, as well as the services and outputs the RDA community has produced. In the second day of the event, the participants had the opportunity to participate in a hands-on session that involved the use of RDA services and their application together with external related services on the relevant data.
A key role of the RDA is to help scientists and researchers from disparate disciplines develop the skills and tools necessary for carrying out data science and data sharing within their fields. In addition, researchers from disparate domains are often attempting to solve similar problems, but may be unaware of how an issue is already being tackled by practitioners elsewhere. For both these reasons, outreach and education play a highly important role in disseminating knowledge and expertise.