Data for the Planet: Making Data Work for Cross-Domain Challenges
A Pressing Challenge for Global Science!
The pressing and major global scientific and human challenges of the 21st century (including climate change, sustainable development, disaster risk reduction) can only be addressed through research that works across disciplines and geographical boundaries to understand complex global systems, and that uses a transdisciplinary approach to support extraction of the required knowledge and understanding.
The digital revolution offers major opportunities and challenges. Some disciplines have grasped the opportunity and made dramatic progress in their ability to generate, analyse and share data. Many other disciplines have made more limited progress. But even in those disciplines that have made good progress, their systems and methodologies are often primarily designed to support users who are experts in that discipline – not non-specialists or algorithms and software for machines. Overall, the ability to efficiently link, combine and analyse diverse data from different disciplines, at scale, so as to systematically model and identify patterns in complex systems remains embryonic and scattered.
The result is widespread and expensive duplication of effort as a multitude of researchers and technical experts invest repeatedly in systems and standards to enable the integration and analysis of the data they need to address each unique scientific problem. The manual effort required to prepare and cleanse data before use is an enormous and unacceptable diversion of scientific resources. It is estimated that 80% of research expenditures are used to prepare inconsistent data for use . The ultimate impact is to limit our ability to respond rapidly and efficiently to global problems.
Making Data Work for Cross-Domain Challenges
Solutions to complex and difficult problems require data to be assessable and actionable by machines using big data in combination with the most advanced hardware and software technologies. Data must be richly described with metadata, well-documented, transparent and ultimately humanly comprehensible so as to facilitate extraction of meaning from complexity. The fundamental enabler of data-driven science is an ecosystem of resources that enable data to be FAIR (Findable, Accessible, Interoperable and Re-usable) for humans and machines. This ecosystem must include effective, maximally automated stewardship of data, and effective terminologies and metadata specifications. Once a system of Data-Driven Interdisciplinarity is developed and implemented, research will become dramatically more efficient. Duplication of global effort in interdisciplinary data discovery, access, visualization and analysis will be significantly reduced. Data will be transparently shared across technical, political, cultural, geographical and language “borders”. More significantly, decision support related to major societal issues and disasters will be accelerated and more responsive resulting in negative impact reduction.
How will we mobilise these solutions?
As part of the International Science Council’s (ISC) Science Action Plan, CODATA will develop and implement – on behalf of ISC – a decadal programme to advance structural data-driven interdisciplinarity, to make data work for cross-domain challenges. A pilot initiative, with support from ISC and the China Association for Science and Technology (CAST) has prepared the ground and allowed CODATA to test and refine the approach and build necessary collaborations. Over the next two years, CODATA will put in place the core funding, capacity and partnerships in order to launch a decadal programme at the ISC General Assembly in Oman in October 2021.
What will the decadal programme accomplish and what impact will it have?
The initiative will apply the techniques of the data revolution to assist data-intensive discovery – in particular for multi-disciplinary, global challenge research areas. The programme will take a three-pronged approach (though engagement and cross-fertilisation between these areas of activity will be essential):
1) Enabling and coordinating convergence in Data-Intensive Science: working with domain and technology experts, the programme will develop consensus on and work to facilitate the implementation of enabling technologies and good practices for data intensive discovery that is applicable across disciplines. This includes the core model for FAIR resources, interoperability and interchange; the supporting ecosystem of identifiers, mapping resources, registries, vocabularies metadata specifications and other internationally critical core resources; such as the data science and machine learning techniques applied to transparent and reproducible discovery.
2) Mobilising Domains and Breaking Down Silos: the programme will pro-actively engage with international scientific unions and associations in programmes of work designed to promote progress on interoperability of data and related services across the disciplines of science that will enable interdisciplinary data interoperability.
3) Advancing interoperability through cross-domain demonstrating case studies: the programme will work with a number of cross domain case studies (including, but not limited to the Sustainable Development Goals, disaster risk reduction and reporting, urban health and resilience and infectious diseases). The programme will apply, refine and accelerate the methodology developed in the earlier pilot: examining the interoperability challenges and developing solutions through intensive expert workshops. This strand will feed back into the other two in order to build consensus to increase cross-domain interoperability. The overall impact of the programme will be to accelerate the step change in the ability of the scientific community at large to conduct more interdisciplinary data-intensive science. The programme enable the reduction of the proportion of effort dedicated to data cleaning and wrangling. It will maximise the amount of machine-actionable FAIR data available for analysis and linking. And it will thereby enable more efficient and transparent science to address global challenges.