Coordinating Data Standards amongst Scientific Unions

The key issues the proposed Task Group will address is the lack of visibility of standards that have been and/or are being developed and/or endorsed to assist in interoperability of scientific data from individual Science Unions. The lack of awareness of what is being developed is leading to a worrying level of duplication/ incompatibility between the plethora of data and information standards (including vocabularies, ontologies) within and between the Science Unions that are under the ICSU umbrella. A secondary issue is that there is little connection between the Science Unions and those groups that are developing best practices data and informatics infrastructures such as the international Research Data Alliance, as well as national efforts such as the Australian National Data Service, the Earth Science Information Partners (ESIP), EU 2020 projects, etc.

The significance of this Task Group is driven by the digital revolution of the last decade and the accelerating rate at which scientific data is being generated. Combined, these have created enormous opportunities for science, provided that the way science is done adapts to exploit these opportunities. Data sharing and ready access to open data resources are fundamental; permitting data and ideas to be reused, recombined and integrated in creative ways. Moreover, open data is a necessary, but not sufficient, condition for more open science in which laboratory and library doors are opened to reach out to other societal stakeholders to engage in the joint production of knowledge about many of the major issues that confront global society.

Sharing data, information and services in the most efficient and accessible way, and utilising them to best effect in the creation of new knowledge, is dependent on the development and use of common practices for the discovery, access, sharing, interpretation and retention of these data. Many, if not most, of the decisions about what to store, what shared agreements or standards to apply, and what are the minimum required metadata lie, or should lie, with the relevant disciplines and the international scientific unions that help define the priorities, principles and needs of those disciplines.  It is vital that they systematically concern themselves with raising awareness of, and promoting such standards.

Coordinating data standards is essential at an international level, if at a basic level we are to be able to discover and access data from across, and beyond the various sciences to enable data from all Science Unions to be used in transdisciplinary projects, particularly those that seek to tackle the most pressing environmental and societal issues facing humanity (e.g., the ICSU Future Earth Initiative).

Without a strategic effort to begin to harmonise the standards and data infrastrucutres that are being developed it will be near impossible to access the breadth and depth of scientific data of today and that of the future to provide transparent and evidence based advice to governments on best practice to build the global sustainability that will underpin our future earth.

It is also hoped that by the Science Unions endorsing what their constituency believes is best practice in the sharing and interoperabilty of scientific data, as well the authoritative vocabularies and standards that support this, that the profile of the Science Unions will be raised amongst the general body of scientists globally.

Addressing the CODATA Priority areas:

To promote Open Data principles, policies and practices

This Task Group is about enhancing Open Data Principles and practices through standardisation. There is no point in having Open Data if it is all non-standard.

To advance the frontiers of data science and its adaptation to scientific research;

This Task Group will require linkages to those that are undertaking best practice in data science and will help connect those that are practicing data science in the various Science Unions and raise their profile.

To build capacity for improving skills and the functioning of science systems (particularly in low and middle income countries - LMICs)

This Task Group will be international. The outputs could be of great benefit to LMICs, as they would have access to international standards and best practice in maintaining them. It is probable that the standards will have to be enhanced for local conditions, but then it is better to adapt and adopt existing systems than to develop completely new standards in isolation of what has already been done.

Task Group outputs

  1. Identify which Science Unions have establish either a specific Commission on data and information, or identify a group that are the point of contact for issues related to information standards being developed/governed/endorsed by their Unions;
  2. Use the CODATA web site and social media to take a leadership role in raising awareness of standards endorsed by and/or being development by the individual unions to assist in promoting the authoritative standards and minimise duplication of effort;
  3. Provide a web-accessible page that provides links to repositories for data models, information standards, vocabularies, ontologies, etc., for each of the unions;
  4. Determine a broad ‘maturity model’ for scientific standards adapted from the 5 star Open Data model (http://5stardata.info/en/ ) and the AGU Data Maturity Framework (http://dataservices.agu.org/dmm/ ) that provides a guide to users as to the useability of the standards and a guide to developers as to the overall maturity of their standards within the International Scientific community and assist in ensuring ‘fitness for purpose’.
  5. Provide best practice examples for the development and application of the required standards and guidance on developing governance frameworks for the maintenance and revision of these standards, preferably by assisting linkages to key groups such as the international Research Data Alliance, as well as national efforts such as the Australian National Data Service, the Earth Science Information Partners (ESIP), EU 2020 projects, etc.; and
  6. Provide guidelines to the scientific community for the need to adhere to these standards and promote the benefits of adherence to standards to increase discovery and accessibility to data.

Note: the goal of this Task Group is to provide links to where the required standards and information are stored – it will NOT be storing those developments and standards 

Task Group Activities

  • Finalise Concept Paper to be distributed to the science unions, Jan-17
  • Based on feed back to the Concept Paper, develop Final Project Plan for 2017, Feb-17
  • Engagement Plan developed that with the assistance of CODATA, contacts all Science Unions in ICSU to raise awareness of the Task Group and to establish key contacts, Mar-17
  • Task Group meeting at EGU 2017, Apr-17
  • Liason with key groups that are developing required infrastructures (vocabularies, vocabulary services, repositories), Jun-17
  • Web page established on CODATA site that lists 1) key contacts in the science unions; 2) key standards of each science union; 3) benchmarking of each standard against an agreed ‘5 star rating’; and 4) lists best pracitce examples, Oct-17
  • Progress report written to CODATA, including assesment on which will be the best way to proceed for 2018, Nov-17

Chairs

Xiaogang (Mashall) Ma, Chair

Assistant Professor, Department of Computer Science, University of Idaho

max (at) uidaho.edu

Xiaogang (Marshall) Ma is an assistant professor of computer science at the University of Idaho. He received his Ph.D. degree of Earth Systems Science and GIScience from University of Twente, Netherlands in 2011. Before joining UIdaho he was an associate research scientist of Data Science and Semantic eScience at Rensselaer Polytechnic Institute. His research focuses on deploying data science in the Semantic Web to support cross-disciplinary collaboration and scientific discovery. Ma is active in several international societies of data science and geoinformatics, and is experienced in research management and community service. He received the IAMG Vistelius Research Award in 2015 and the inaugural ICSU-WDS Data Stewardship Award in 2014. He won the ESIP Funding Friday Competition Award twice in 2013 and 2012.


Lesley Wyborn, Deputy Chair

Adjunct Fellow, National Computational Infrastructure Facility and The Research School of Earth Sciences

lesley.wyborn (at) anu.edu.au 

Lesley Wyborn is a geochemist by training and joined the then Australia Bureau of Mineral Resources in 1972 and for the next 42 years held a variety of geoscience and geoinformatics positions as BMR evolved into Geoscience Australia. In 2014 she joined the Australian National University and currently has a joint adjunct fellowship with National Computational Infrastructure and the Research School of Earth Sciences. She is Deputy Chair of the Australian Academy of Science ‘Data for Science Committee’. Her awards include the Australian Public Service Medal for her contributions to Geoscience and Geoinformatics (2014), the Geological Society of America, Geoinformatics DivisionCareer Achievement Award (2015) and Fellow of the Geological Society of America (2016).