May 2026: Publications in the Data Science Journal

Title: Correction: Building Responsible and Sustainable Open Data Literacy Skills for Early Career Researchers: A Decade of the SoRDS Programme
Author: Shaily Gandhi, Steve Diggs, Marcela Alfaro Córdoba, Louise Bezuidenhout, Raphael Cobe, Sara El Jadid, Bianca Peterson, Robert Quick, Hugh Shanahan, Shanmugasundaram Venkataraman, Ekpe Okorafor, Veerle Van den Eynden
URL: http://doi.org/10.5334/dsj-2026-019
Title:  Enabling the Reuse of Personal Data in Research: A Classification Model for Legal Compliance
Author: Eduard Mata i Noguera, Ruben Ortiz Uroz, Ignasi Labastida i Juan
URL: http://doi.org/10.5334/dsj-2026-018
Title:  On the Darwin Core Term dwc:habitat, and the Need to Adopt a European Vocabulary Based on NATURA2000 and EUNIS Classifications, with a Comment on International Applicability
Author: Roberto Pizzolotto, Fabiola Durante, Wouter Dekoninck
URL: http://doi.org/10.5334/dsj-2026-017
Title:  Certification Frameworks for Scientific Data Repositories: Strengthening Repository Trustworthiness
Author: David Castle, Daniela Santos Oliveira, Dale Peters, Claudia Bauzer Medeiros, Ioana Popescu, Devika Madalli, Rebecca Koskela, Meredith Goins, Suzie Allard
URL: http://doi.org/10.5334/dsj-2026-016

 

Disaster Risk Reduction and Open Data Newsletter: May 2026 Edition

El Niño likely to return: the case for early action

Climate models point to a likely return of El Niño by mid-2026. Its strength remains uncertain, but waiting for certainty can increase exposure to avoidable losses. History tells the risks; drought, agricultural collapse, and disease outbreaks, hitting poor and food insecure regions the hardest. Unfortunately, this return arrives with governments and households less resilient than before, and climate change pushing risks further. The priorities should be; turning climate forecasts into actionable ground-level decisions, early financing options, and strengthened coordination across sectors. The window for preparation and long term resilience opportunities is open.

How controlled burns can help save taxpayers billions

Indigenous nations have been clearing underbrush and trees or employing prescribed burns for centuries. A study published in Science confirms what land managers have long argued; preventing wildfires is cheaper than fighting them. Every dollar spent on clearing underbrush and trees, and prescribed burns avoided $3.73 in damage. Yet US federal policy has moved recently in the opposite direction, with suppression prioritised over prevention and one million fewer forest acres with prevention measures adopted in 2025 than 2024. Wildfire prevention can also bring benefits for ecology and recreation, however, not everyone is in support of the tactic.

Tulane researchers say Louisiana could lead global climate adaptation efforts

Louisiana is losing land faster than almost anywhere on Earth. New research published in Nature Sustainability has identified an ancient shoreline roughly 30 miles north of New Orleans, which formed 125,000 years ago when temperatures were just 0.5-1.5°C warmer than preindustrial levels. With global temperatures now approaching that 1.5°C threshold, a similar retreat may already be locked in. The authors are clear in that this does not have to be an inevitable disaster. An early start with planned, managed relocation, can transform retreat into renewal. Sweden’s city of Kiruna, currently relocating 6,000 residents due to mining activity, proves it can be done. The window to plan is open, but it will not stay open for long.

From forecasts to futures: how Ugandan communities are turning early warnings into everyday action

In Uganda’s flood-prone Kamuli and drought-affected Pakwach districts, the Water at the Heart of Climate Action (WHCA) programme is transforming how communities prepare for climate hazards. Launched in 2024, WHCA unites government agencies, the Uganda Red Cross Society, and humanitarian partners to build community-rooted early warning systems. Recognising that “people know their risks better than any map,” the programme began by listening, engaging over 3,000 participants to shape its roadmap. At the heart of the programme is training the community and trusted groups, because “when people hear advice from their church or local radio, they act faster”. Ultimately, WHCA aims to turn early warning into an everyday culture of preparedness, leading the way in learning to live with climate hazards.

Cities are rehearsing for deadly heat. Will it help when disaster comes?

In a tunnel beneath Paris, kept at a cool 18°C, schoolchildren acted to simulate the chaos of a 50°C heatwave. They faked food poisoning from spoiled refrigerated goods and carbon monoxide leaks from emergency generators. Above ground, firefighters and city officials worked through the cascading failures such heat would trigger across power, transport and health systems. The drill led to 50 recommendations now embedded in Paris’ Climate Action Plan, with a few other cities following suite. The lessons learnt being that a heat action plan on paper is not the same as knowing how to execute it under pressure, and residents (not just officials) must be prepared.

Revolution’s aftermath: population based cross-sectional study to understand the intergeneration mental health and wellbeing following the 2024 student-led uprising

When Bangladesh was gripped by a nationwide student uprising in July 2024, millions were exposed to prolonged unrest and traumatic events. This open-access, population-based study is among the first to quantify the psychological impact, using the validated PTSD Checklist for DSM-5 to assess probable PTSD among Bangladeshis aged 15 and over within three months of the uprising. The study highlights the value of applying Findable, Accessible, Interoperable, and Reusable (FAIR) principles to post-crisis mental health data and provides a rigorous, reproducible framework for studying the mental health impacts of disasters. Findings reveal high probable PTSD rates following large-scale confrontations, underscoring the need for culturally appropriate interventions and continued monitoring. As political unrest, climate shocks, and displacement increasingly overlap worldwide, interoperable mental health data is becoming essential to effective recovery and disaster risk reduction.

Open Science, Health Data and Epistemic Harms: A Multidisciplinary Reflection

Open Science (OS) promises to democratise knowledge and reduce inequalities, but does it deliver? This interdisciplinary essay from researchers at the University of Warwick argues that, particularly in health data, OS can amplify structural vulnerabilities for already marginalised communities. Drawing on a September 2025 workshop, the authors bring perspectives from law, epidemiology, disability studies and data justice to critically examine OS’s “promise and paradox.” They highlight how OS infrastructures can perpetuate Eurocentric knowledge norms, corporate capture and data colonialism. The authors call for OS to move beyond techno-optimism and instead embrace relational data governance, CARE principles, Indigenous data sovereignty, accessibility as a core principle, and stronger accountability frameworks, ensuring openness genuinely serves collective benefit rather than reinforcing existing power structures.

Preparatory phase of large earthquakes illuminated by unsupervised categorization of earthquake catalog features

Predicting large earthquakes remains a major scientific challenge due to the complexity and variability of fault systems and their preparatory processes. This study applies an unsupervised machine learning framework to seismicity catalogues to identify patterns that may signal the lead‑up to large earthquakes. For earthquakes with a clear preparatory phase, certain clusters of smaller earthquakes became increasingly localized and interconnected in the lead-up to the main event. These “critical” patterns reflected higher levels of strain release and earthquake interaction compared to background seismicity. Importantly, the method did not detect the same signals before earthquakes with no clear preparatory phase, suggesting it may help distinguish when meaningful precursory activity is occurring. The researchers say the approach shows promise for improving operational earthquake forecasting, while also highlighting that not all earthquakes exhibit detectable warning patterns beforehand.

Disaster risk planning in an evolving risk landscape: Barriers and enablers in the integration of land use and preparedness actors

Integrating land use planning with preparedness actors such as fire and rescue services and crisis management is becoming increasingly important in disaster risk management. Evolving risks linked to climate change, urban densification, and complex hazard interactions require both preventive and resilience‑based approaches. Four key themes were identified shaping collaboration: institutional structures and data sharing, alignment of priorities and risk perceptions, resource availability and capacity, and role clarity and mutual understanding. Evidence shows that fragmentation, unclear responsibilities, limited information exchange, and competing priorities constrain effective coordination across these actors. As responsibilities for prevention and response remain organised across separate systems, insufficient integration can weaken feedback between planning and operational practice, limiting the ability to reduce risks and adapt to emerging threats over time.

Reinterpreting disasters and urban resilience in the Anthropocene: Disaster management or transforming with disasters?

Disasters are increasingly shaped by the conditions of the Anthropocene, where human-driven environmental change is producing more frequent, interconnected, and unpredictable events. The analysis argues that disasters can no longer be understood as discrete, natural occurrences, but as chronic, systemic processes arising from complex interactions between social and ecological systems. In this context, traditional disaster management approaches—based on control, prediction, and recovery—are increasingly limited by uncertainty and the breakdown of stable cause‑effect relationships. Instead, the study highlights a shift toward “transformative resilience,” where disasters are understood as potential catalysts for systemic change. This approach emphasises reconfiguring socio‑ecological systems, addressing structural drivers of vulnerability, and enabling cities to reorganise and evolve in response to disruption, rather than returning to pre‑disaster conditions.

Emergency Preparedness, Disaster Displacement and Climate Migration

Climate‑related hazards are reshaping patterns of human mobility, with preparedness emerging as a key factor determining whether displacement is safe, planned, or chaotic. The analysis shows that linking risk assessment, early warning systems, and pre‑arranged financing enables earlier and more effective action, reducing losses and distress migration. Inclusive risk communication and evacuation planning are critical to ensuring that vulnerable populations can move safely or access protection in place. The duration and impacts of displacement are strongly influenced by the pace of recovery, including housing reconstruction and access to social protection. Over the longer term, combining adaptation measures that support habitability with safe, voluntary migration pathways is identified as essential to strengthening resilience under increasing climate pressures.

16th UCL Risk & Disaster Reduction annual conference | UCL Risk & Disaster Reduction

How do cities confront emerging risks and build more resilient futures? The 16th UCL Risk and Disaster Reduction Annual Conference brings together researchers, practitioners, and policymakers across disaster risk — from multi-hazard modelling to AI ethics in crisis management. The programme features a keynote from UNDRR’s Loretta Girardet, and sessions span from engineering to social science, as well as an explorative hackathon reimagining future cities.

Date & location: 24 June 2026 | UCL, London, United Kingdom

2nd Bonn Risk Finance Dialogue

Vulnerable communities remain dangerously under protected as climate risks intensify and financing gaps widen. The 2nd Bonn Risk Finance Dialogue will bring together practitioners, researchers, and policymakers to tackle this challenge head-on. Structured around three themes; scaling up policy and financing frameworks, scaling out financial protection solutions, and scaling deep through stronger delivery systems, the Dialogue aims to advance climate and disaster risk finance and insurance (CDRFI) for those who need it most. The event organisers welcome contributions to the Dialogue.

Date & location: 17-18 June 2026 | Haus der Evangelischen Kirche (EViB), Bonn, Germany

Integrating Risk Reduction and Risk Finance: From Strategy to Implementation

National financing for Disaster Risk Reduction (DRR) remains insufficient, fragmented, and too disconnected from implementation processes. Yet countries are increasingly seeking practical tools to change that. A workshop will be held to explore emerging solutions, including the Climate Insurance and Resilience Programme (CIRP) and the G20 Compendium on Risk Transfer Instruments, and assess their real-world applicability. The workshop will strengthen national DRR financing strategies, advance anticipatory financing, and gather country feedback to shape joint technical guidance. This workshop is a complementary event to the 2nd Bonn Risk Finance Dialogue event.

Date & location: 17 June 2026 | Haus der Evangelischen Kirche (EViB), Bonn, Germany

CESSDA “Future-Ready Social Science: Data, Policy, and Impact” (50th Anniversary)

As digital transformation, open science, and AI reshape research, the need for trustworthy, interoperable, and policy-relevant social science data has never been greater. To mark its 50th anniversary, CESSDA, the Consortium of European Social Science Data Archives, convenes researchers, data experts, policy actors, and funding partners from across Europe and beyond to reflect on five decades of collaboration and chart a shared vision for the future. Anchored in CESSDA’s strategic pillars of Data, People, and Landscape, the conference creates space for bold questions and stronger synergies, advancing a European Research Area where data is as open as possible and as closed as necessary.

Date & location: 15–18 June 2026 | Bergen, Norway

CODATA-RDA School of Research Data Science

Contemporary research cannot be done effectively without strong data skills, yet access to quality training remains unequal. The CODATA-RDA School of Research Data Science offers early-career researchers and professionals from Africa and South America a self-paced, online curriculum covering ten themes in research data science, from technical skills to responsible research practices. Delivered through video lectures, exercises, and live question-and-answer sessions, the school equips participants with the foundational skills needed to work with data effectively in 21st-century research. Open to master’s students, postdoctoral researchers, and professionals, this initiative reflects CODATA’s commitment to building an inclusive, globally connected research data community.

Date & location: 1 June – 17 July 2026 | Online (short course)

Disaster Risk Reduction and Open Data Newsletter: April 2026 Edition

Why disaster risk financing must evolve to meet the climate crisis

Climate-related disasters are increasing in frequency and severity, while global adaptation and resilience finance remains largely reactive and far below estimated needs. Evidence shows finance flows often rise only after disaster losses occur, reinforcing a cycle of response rather than prevention. The analysis highlights mismatches between current adaptation funding and projected requirements, alongside findings that policy uncertainty discourages private investment. Financial instruments such as parametric insurance, catastrophe bonds, and blended finance are used in regions including the Caribbean, Pacific, and parts of Asia, but remain underutilised globally. Examples from Mexico, Indonesia, Kenya, and the Philippines show how pre-arranged disaster risk financing enables faster, more predictable responses to climate shocks.

CDIF4EOSC: watch this space!

CODATA has advanced with the Grant Agreement Preparation process with the European Commission for CDIF4EOSC, a three‑year project aimed at strengthening cross‑domain interoperability within the European Open Science Cloud (EOSC). Building on the existing Cross‑Domain Interoperability Framework, the project will extend recommendations through profiles, guidelines, and use‑case examples to produce an actionable playbook supporting FAIR integration across EOSC and related data spaces. CDIF4EOSC will promote a FAIR‑by‑design approach to digital objects, supported by AI‑assisted FAIRification tools and tested through use cases in ocean science, climate adaptation, and safe and sustainable materials. With a total budget of €8 million, the project brings together a large European consortium and targets direct integration with EOSC Federation Nodes and Common European Data Spaces.

Unlocking the Economic Dividend of Resilience Investment

Resilience spending isn’t just “avoiding future damage”—it can be an economic stimulus right now. Resilience investment is often sold as insurance against tomorrow’s disasters. Tonkin + Taylor says that framing is too narrow—and it slows action when budgets are tight. Instead of counting only “avoided losses” from floods, slips, or coastal inundation, it urges decision-makers to capture the “triple dividend”: preventing damage, unlocking economic and development gains (jobs, growth, business confidence), and delivering social and environmental benefits that accrue even if no disaster strikes. It points to New Zealand’s long history of flood protection, noting assets valued at $3.6b delivering $13b in benefits each year. With public funding constrained, it backs beneficiary‑pays and value‑capture tools, and faster property-level upgrades supported by insurance and low-interest finance.

Systemic risk is the hidden tax on growth: insurance can help

Systemic risk is increasingly shaping economic growth as climate shocks, geopolitical disruption, public‑health crises and technology concentration collide. The article argues these risks often begin invisibly, raising capital costs, discouraging innovation and weakening resilience until they cascade into crises, as COVID‑19 demonstrated. Climate disasters are widening insurance “protection gaps” as coverage retreats in higher‑risk areas, affecting property markets and investment. Supply‑chain shocks and threatened shipping choke points add volatility and inflation pressure, while AI’s reliance on concentrated data centres and semiconductor supply chains creates fragile failure points. The proposed shift positions insurance as a growth stabiliser through risk modelling and risk‑sharing, early warning, incentives for adaptation, and public‑private risk pools.

Building the Market for Resilience: A new opportunity for financial institutions

Insured losses from natural catastrophes have exceeded $100 billion for six straight years. Banks in emerging markets are already seeing the consequences through higher loan defaults, weakened collateral after repeated storms, and uninsured small businesses. Adaptation is no longer primarily a government responsibility, as firms are investing in resilience to protect assets, operations, and supply chains. With resilience solutions markets growing, financial institutions can accelerate the shift by integrating physical climate risk into credit and investment decisions, financing resilience through debt and equity, and using tools like contingent finance and resilience bonds. As more countries publish National Adaptation Plans and clearer taxonomies emerge, early-mover banks could help unlock a $130 billion-a-year resilience financing opportunity by 2030.

AI and drones team up to find climate-resilient wheat

AI and drones are helping wheat breeders find varieties that stay productive as weather becomes more erratic. A 2026 study tracked 64 durum wheat varieties in Mediterranean conditions, comparing irrigated plots with rainfed fields. Drones carrying multispectral and thermal sensors captured early signs of plant stress and moisture, and AI models used that data to predict not only yield but “production stability” across good and bad seasons. The key finding challenges a common assumption: staying green late into the season did not reliably boost yields and could reduce stability. Instead, the most resilient performers showed vigorous early growth and earlier maturation, helping them avoid late-season heat and drought.

How AI’s language barrier limits climate disaster responses

AI is increasingly used by governments and organisations to scan social media for early warning signals during floods, heatwaves and other climate emergencies, but a major blind spot is language as it’s actually used online. Posts often rely on code switching, slang, Pidgin, sarcasm, and locally shared cues of urgency, so an AI trained on western‑centric, standard English data can misread a genuine call for help as casual commentary. That cultural fingerprint in training data can systematically diminish underrepresented voices in developing countries, with real consequences when misinterpretation delays response and puts lives and property at risk. The fix is practical: train and test models on real regional posts, and build systems that recognize cultural context and urgency signals.

Integrating climate adaptation and peacebuilding: capacity development in climate and conflict-affected communities

Communities affected by armed conflict face heightened vulnerability to climate change due to displacement, infrastructure damage, restricted mobility, and limited access to land and water. Climate change adaptation and peacebuilding interventions both seek to reduce vulnerability and build resilience, yet they have largely developed in separate policy and practice domains. Using a two‑stage case study from a conflict‑affected region of Colombia, based on semi‑structured interviews and document analysis, the analysis identifies areas of convergence and divergence between these approaches. Six domains of potential synergy emerge— access to information, education, social networks, employment, environmental management, and healing. Two notable gaps remain, relating to protection and safety, and socio‑cognitive factors such as social identity and risk perception. An integrated framework is proposed to better align adaptation and peacebuilding efforts and reduce reinforcing cycles between climate vulnerability and violent conflict.

Landscape of climate finance in Ethiopia

Ethiopia has built an ambitious climate policy platform since launching its Climate Resilient and Green Economy strategy in 2011, aiming to combine rapid growth with low‑carbon development and stronger resilience. But the report’s 2019/20 mapping shows climate finance remains far below need: around USD 1.7 billion a year was committed, just 7% of estimated requirements of USD 25.3 billion and under 2% of GDP. Funding skews toward adaptation at 56% compared with 38% for mitigation, and flows are dominated by international public finance, mostly delivered through grants. To close the gap, the report argues for stronger tracking and transparency, more blended finance and PPP approaches to de-risk investments, central-bank reforms to unlock green lending and capital markets, and long-term capacity support so sub-national and non-state actors can build investable pipelines.

Gender in climate and disaster risk finance and insurance in Bangladesh

Bangladesh’s escalating climate hazards are intersecting with entrenched gender and social inequalities, and the paper argues CDRFI will not deliver fair outcomes unless inclusion is built into the financial architecture, not added on through pilots. It finds Bangladesh has extensive policies, but implementation is held back by weak coordination across ministries, limited gender-relevant tracking and data, and market rules that still make access harder for women—especially when customer data are not disaggregated and enrolment processes rely on documentation many women do not hold. The recommendations focus on enforceable levers: linking budget release to gender markers and results, strengthening legal and regulatory requirements for gender-responsive product design, using microfinance networks as a scalable delivery channel, investing in awareness and trust-building, and creating a standing coordination mechanism so finance, data, and delivery systems work together.

Introduction to financial assessments to address climate change

Turning climate plans into action increasingly depends on knowing what measures cost, where the money can come from, and how to shift finance at scale. UNDP’s financial assessments are designed to estimate the incremental, direct funding required to implement climate measures, identify the size of the financing gap, and map potential sources of public and private finance. The aim is to help countries move from targets to delivery by strengthening budget planning, aligning ministries around a shared investment pathway, and building evidence that can support policy reform and stronger climate finance proposals. The assessments can be applied to different national goals, including NDCs and long-term strategies, and are positioned as a repeatable planning tool that supports implementation decisions as well as engagement in international climate finance and negotiations.

AI for Social Risk Forecasting and Explanation: The Power of Machine Learning–Based Social Risk Models

AI is increasingly being used to forecast social risks that can destabilise fragile and climate-affected settings, including conflict, displacement, and crime. The article presents three proof-of-concept machine-learning models that combine satellite imagery, text analysis of news and social media, and economic, climate, and geospatial indicators to detect changing risk patterns and generate usable proxies where official statistics are limited. Reported out-of-sample accuracy ranges from 63–76% for conflict prediction and 70–74% for population change, and the explanatory signals most associated with elevated risk include politically sensitive language, economic pressure and price shifts, climate stress, and changing social perceptions. The takeaway is that social-science-informed AI can complement conventional analysis by improving monitoring, enabling earlier action, and supporting more targeted allocation of limited resources.

 

Dataverse Community Meeting 2026

The Dataverse Community Meeting 2026 has announced it will convene in Barcelona, Spain. This year’s theme, “Advancing Data and Dataverse: AI, Interoperability, and Sensitive Data”, highlights key areas of interest for data professionals and researchers. The three focus areas are building AI solutions for data to enhance repository workflows, data quality, and AI-ready data; improving interoperability to enable richer linkage and reuse across datasets, domains, and platforms; and expanding support for sensitive and restricted data.

Date & location: 12-15 May 2026, World Trade Center, Barcelona, Spain

TWO WEEKS TO GO – Call for participation: UNESCO and CODATA survey on open science for data policy for times of crisis

UNESCO, in collaboration with the International Science Council’s Committee on Data (ISC CODATA), has launched a global survey to assess how organizations are implementing data policies for times of crisis, in alignment with open science principles. The survey builds on the Data Policies for Times of Crisis Facilitated by Open Science (DPTC) resources as part of the UNESCO Open Science Toolkit. By participating in this survey, organizations contribute to shaping global dialogue and advancing coordinated, ethical, and effective data management for future crises.

The questionnaire takes approximately 10 – 15 minutes to complete. The deadline to submit your response is 11 May 2026.

GEO Symposium & GEO-21 Plenary

The 2026 GEO Symposium and GEO-21 Plenary explore how Earth Intelligence can drive transformative, resilient solutions for people and the planet at a pivotal moment in the implementation of GEO’s Post-2025 Strategy. This year’s theme is “Investing in Earth Intelligence for a Resilient Future”. It will convene governments, space agencies, research organizations, private sector innovators, and development partners.

Date & location: 26-28 May 2026, World Meteorological Organisation, Geneva, Switzerland

Global Water Summit 2026

The need for a water transition is easy to endorse. Delivering it is harder. Climate extremes, rising energy demands, and pressures on capital mean the systems we rely on must adapt — quickly. New technologies and AI will tackle such challenges, even as they place new demands on water. How do we strike a balance? This year’s Global Water Summit is about turning that question into action — adapting faster, smarter, and at scale

Date & location: 18-20 May 2-26, Madrid Marriott Auditorium, Spain

WDS-ECR Co-Chair Opportunity: Apply Now! 

The World Data System Early Career Researcher Network (WDS-ECR) invites applications for a co-chair position. The selected candidate will join two current co-chairs in leading a global network dedicated to promoting best practices in research data management and fostering professional growth among early-career researchers. This is a three-year term, starting in July-August 2026, offering an excellent opportunity for early-career data stewards who aspire to make an impact on the international stage.

Application deadline: 31 May 2026

CALIBRATE 2026: Africa’s Climate Entrepreneurship Summit

Calibrate 2026 invites researchers, innovators, and practitioners to contribute to Africa’s premier climate entrepreneurship summit through research papers and innovation showcases. Visit the website for thematic areas of each. Selected papers will be considered for publication in the Journal of Nature-Based Solutions and Innovations (JNSI), while outstanding innovations will be featured in Nature-Based Solutions Magazine and eligible for the pitch competition with investment opportunities.

Abstract submission due 30 April 2026, full paper deadline due 11 May 2026

Date & location: 21-23 May 2026, Accra, Ghana

10th International Conference on Flood Management (ICFM10)

The International Conferences on Flood Management (ICFM) stand as a distinguished global platform committed to addressing and advancing the field of flood management. The theme for this years conference is “Adapting to Global Change: Innovative Approaches to Flood Management and Resilience”. The ICFM brings together experts, practitioners, policymakers, and researchers from across the globe to deliberate on contemporary challenges and innovations in flood management.

Date & location: 20-22 May 2026, London, Ontario, Canada

April 2026: Publications in the Data Science Journal

Title: Open Science, Health Data and Epistemic Harms: A Multidisciplinary Reflection
Author: Tatenda Chatikobo, Frances Griffiths, Nikita Hayden, Gary Leeming, Ankita Mishra, Eva Morris, Luca Schirru, Nathanael Sheehan, Andrew Williams, Sharifah Sekalala
URL: http://doi.org/10.5334/dsj-2026-015
Title: FAIR Data Workflow Implementation and Assessment for Ion-Exchange Chromatography in Plasma Science
Author:  Robert Wagner, Ron Henkel, Kristina Yordanova, Dagmar Waltemath, Markus M. Becker
URL: http://doi.org/10.5334/dsj-2026-014

 

March 2026: Publications in the Data Science Journal

Title: Publishing Fine-grained Standardized Metadata – Lessons Learned from Three Research Data Centers
Author: Knut Wenzig, Andreas Daniel, Dominique Hansen, Tobias Koberg, Mihaela Tudose
URL: http://doi.org/10.5334/dsj-2026-013
Title: Building Responsible and Sustainable Open Data Literacy Skills for Early Career Researchers: A Decade of the SoRDS Programme
Author: Shaily Gandhi, Steve Diggs, Marcela Alfaro Córdoba, Louise Bezuidenhout, Raphael Cobe, Sara El Jadid, Bianca Peterson, Robert Quick, Hugh Shanahan, Shanmugasundaram Venkataraman, Ekpe Okorafor, Veerle Van den Eynden
URL: http://doi.org/10.5334/dsj-2026-012
Title: Correction: Essential Aspects of Tools for Developing Scientific Data Management Plans
Author: Fabiano Couto Corrêa Silva, Sandra de Albuquerque Siebra, Laura Vilela Rodrigues Rezende, Alexandre Faria de Oliveira, Denise Oliveira de Araújo
URL: http://doi.org/10.5334/dsj-2026-011
Title: Aligning NASA Earth Science Data Stewardship with FAIR Principles: Outcomes, Recommendations, and Future Directions
Author: Ge Peng, Hampapuram K. Ramapriyan, Yaxing Wei, Bhaskar Ramachandran, Gao Chen, Zhong Liu, David F. Moroni, Edward M. Armstrong, Rudiger Gens
URL: http://doi.org/10.5334/dsj-2026-010
Title: Things Fall Apart: Lessons from a Defunded Data Repository
Author: Alex de Sherbinin
URL: http://doi.org/10.5334/dsj-2026-009

 

“Nothing about us without us”: reflections from the CODATA Task Group on Citizen-Generated Data for the SDGs

At IDW2025, a group of speakers from around the world spoke on ‘Bridging Data Gaps with Citizen Science for People and Policy‘. Although communities have long been studied within research and had data collected about them, there is growing recognition that communities should have a voice in the data produced on them and the policies made downstream. Carolynne Hultquist, Co-Chair of the CODATA Task Group, set the stage on challenges and opportunities of incorporating Citizen Science in the United Nations Sustainable Development Goals (SDGs) and the global movement towards meaningful engagement in citizen data. Countries around the world are recognizing the value of involving communities within the official statistical process and learning approaches to address concerns on data quality, standards, and ethics in this new paradigm.

If you are interested in finding out more, come and join the session on Citizen Science, SDGs and FAIR data at the RDA Virtual Plenary on Thursday 19th March 2026 (07:00 – 08:30 UTC)!

Citizen-generated data for progress on the SDGs

The Copenhagen Framework on Citizen Data and its implementation – Haoyi Chen, United Nations Statistics Division (UNSD)

“Nothing about us without us.” The UNSD Collaborative on Citizen Data aims to empower citizens and turn them into agents of change. The impacts of empowering communities opens dialogue with public institutions, respects marginalized voices, and expands the power of data production to citizens. Citizen contributions to data are increasingly recognized as critical to societal wellbeing and in support of the ‘leave no one behind’ principles of the 2030 Agenda for Sustainable Development

The Copenhagen Framework on Citizen Data has been developed to address challenges in defining citizen data, roles that citizen and national statistical offices can play in data processes, and action points for the sustainable production and use of citizen data. Key responsibilities of National Statistical Offices include supporting on quality standards/methodologies, building capacity and fostering partnerships, raising awareness on the potential of citizen data, as well as promoting its integration into official statistics. Ensuring the voices of communities are heard can help to address intersectional marginalization, hold institutions accountable, and ensure that data remain relevant and impactful.

Citizen science/citizen-generated data towards inclusive impact at local and global level – Maryam Rabbie, Sustainable Development Solutions Network (SDSN)

SDG 4 aims to ensure inclusive, equitable, quality education for all, yet access remains a major challenge. UNESCO identifies distance as a major access barrier for many primary and secondary learners, with the IIEP Education Policy Toolkit noting that schools should ideally be within 3 kilometers of children’s homes. Through My School Today, SDGs Today demonstrates the transformative role of citizen science in shaping policy. The initiative engages students, communities, governments, and other stakeholders in geo-referencing schools and education facilities across Africa, contributing to a living, up-to-date map of school locations. The initiative collaborates with education ministries and national statistical offices to complement and strengthen official data sources, demonstrating how citizen-generated data can help bridge information gaps and create more responsive, evidence-based education policies.

Regional landscapes of citizen-generated data

The session was made up of short talks from Africa, Asia, Oceania and Latin America on the use of citizen data for monitoring the SDGs. Some countries are leading efforts to prioritize inclusive community participation in monitoring through intentional engagement and subsequent civic outcomes with action to support progress.

Latin America & the Caribbean – Amanda Mayte Vilchez (Cornell University, USA) & Karen Soacha-Godoy (EMBIMOS Research Group (ICM-CSIC, Spain) | (Iberoamerican Participatory Science Network (RICAP)

In the Latin America and Caribbean region, data generated through participatory science projects have informed, either directly or indirectly, indicators related to twelve of the seventeen SDGs, highlighting both the diversity of data produced and the strong potential of these initiatives to help address critical data gaps. . Among these, SDG 15 (Life on Land) was the most frequently informed, reflecting the strong presence of biodiversity-focused participatory studies. SDG 6 (Clean Water and Sanitation), SDG 14 (Life Below Water), SDG 3 (Good Health and Well-being), and SDG 5 (Gender Equality) are all well represented across multiple indicators.

A central critique of the data used to inform SDG indicators is the underrepresentation of minority groups, who often remain invisible in national-level statistics. In the Latin America and Caribbean region approximately 75% of the mapped initiatives involved vulnerable and historically marginalized populations. Among the most frequent communities involved were rural (27%), Indigenous (16%), farmers (12%), and fishing communities (5%), as well as women, youth, Afro-descendant populations, and older adults. The particular attention given to marginalized communities by participatory initiatives reveals citizen-generated data capacity to address this challenge in data production for SDG monitoring in the region. 

Finally, it is important to highlight the initiatives designed to address community information needs. Among the initiatives analyzed, 42% generate action-oriented data, meaning they are conceived not only to produce information but also to ensure that this data is relevant, responsive, and capable of supporting tangible, real-world change within communities in the region. Producing information with them and for them.

Asia – Yaqian Wu (University College London, UK)

Priorities in Asia focus on SDGs, SDG 6 on water quality monitoring and access to water resources, SDG 3 with a focus on air pollution and health impact assessment, SDG 11 for urban planning and informal settlements, SDG 13 on climate action and disaster response, and SDG 16 on governance accountability. 

There are challenges with data standardisation, representativeness and inclusiveness, institutional absorption paths, and financial sustainability. Regional efforts could support Asian countries to formally embed citizen data in official SDG indicator reporting.

Oceania – Carolynne Hultquist (University of Canterbury, NZ)

Projects in the region have a strong environmental focus, especially related to capturing large-scale negative changes alongside human impact. Priority areas involve SDG 11 Sustainable Cities and Communities on 11.5 Reduce the adverse effects of disasters and 11.6 Reduce the environmental impact of cities; SDG 13.3 Climate Action with emphasis on build knowledge and capacity;  SDG 14 Life Below Water on 14.1 – Reduce Marine Pollution: marine litter and 14.8 to Increase scientific knowledge, research, and technology for ocean health and marine wildlife monitoring; and SDG 15 Life on Land on 15.2 end deforestation and restore Degraded Forests and 15.8 to prevent invasive alien species on land and water ecosystems.

Communities in the region grapple with issues on managing data ethically with appropriate cultural considerations. One of these considerations is data sovereignty, particularly for indigenous groups, as a principle of maintaining control, ownership, and usage of data. Many communities are concerned about potential misuse.

Africa – Kehinde Baruwa, & Peter Elias (Co-Chair CODATA Task Group; University of Lagos, Nigeria) & Oluwatimilehin Adenike Shonowo (University of Glasgow)

Across Africa, communities are increasingly generating their own data to fill gaps in official statistics and support local decision-making. An earlier study mapped 53 citizen science initiatives showing the growing role of participatory data in sustainable development. A more recent study examines additional initiatives advancing SDGs 5, 6, 11, 13, and 15 across Kenya, Nigeria, Cameroon, South Africa, and Tanzania.

These initiatives engage youth, women, and residents of informal settlements and rural areas to monitor issues such as urban environments, water quality, gender and health barriers, climate resilience, and ecosystem restoration. While they generate valuable data for advocacy and planning, challenges remain around validation, institutional collaboration, and long-term sustainability. Strengthening partnerships with National Statistical Offices (NSOs) and academic institutions is therefore key to integrating citizen-generated data into national decision-making and SDG reporting.

How Citizen Science is Shaping Progress for SDGs Oluwatimilehin Adenike Shonowo presented on behalf of Dilek Fraisl (Senior Research Scholar, IIASA & Managing Director, CSGP)

Ghana has become the first country to integrate existing citizen science data on marine plastic litter into their official statistics, as well as SDG monitoring and reporting. The results have been used in Ghana’s Voluntary National Review of the SDGs, and reported on the UN SDG Global Database for SDG 14.1.1 on Marine Litter. The results are also informing the integrated coastal and marine management policy in Ghana, currently under development. The initiative has helped to bridge local data collection efforts with global monitoring processes and policy agendas by leveraging the SDG framework. 

“Nothing about us without us”

A motivation of organisations in implementing the Copenhagen Framework on Citizen Data is to highlight the voice of communities that are often left out or left behind. There is recognition of the importance of representation in reporting, especially on marginalized and vulnerable populations. Some countries are leading efforts to prioritize inclusive community participation in monitoring through intentional engagement and subsequent civic outcomes with action to support progress on the SDGs.


Our CODATA Task Group supports this global movement toward meaningful citizen engagement in data. There are local and regional needs which need to be addressed differently in some cases, but there is also a lot of commonality. We have a lot to learn from each other.

In the 2025-2027 iteration of our Task Group we are providing guidance to work between data and policy frameworks to further WorldFAIR+ with the Cross-Domain Interoperability Framework (CDIF) approach in the context of citizen data in alignment with CARE principles and the Copenhagen Framework. We aim to make progress towards interdisciplinary standards for citizen data and metadata across scales to have actionable globally comparable data for the SDGs. The Task Group is partnering with the Citizen Science Global Partnership (CSGP) Air Quality Community of Practice (CoP) for a case study for SDG 11.6.2: Annual mean levels of fine particulate matter (e.g., PM2.5 and PM10) in cities (population weighted). We are committed to using our networks to continue to highlight and promote the use of citizen data to make progress on the SDGs.

Bridging the Glacier Finance Gap in the Decade of Cryospheric Sciences

Bapon Fakhruddin, CODATA TG-FAIR DRR and Shaily Gandhi, IT:U Interdisciplinary Transformation University Austria

On March 3, 2026, the FAIR Data for Disaster Risk Research working group of CODATA convened a webinar titled “Glacier Adaptation and Financing,” bringing together leading experts to address the accelerating retreat of glaciers, the implications for water security and disaster risk, and the persistent gap in climate finance. Moderated by Dr. Shaily Gandhi, the panel featured Dr. Anil Mishra (UNESCO), Dr. Miriam Jackson (Norwegian Water Resources and Energy Directorate), Dr. Dhiraj Pradhananga (Tribhuvan University, Nepal), and Dr. Bapon Fakhruddin (Green Climate Fund). The discussion emphasized the urgency of translating scientific knowledge into institutional action and financial investment.

Follow these links to consult the slides presented and the the webinar recording.

Accelerating glacier loss and its implications

Recent assessments underscore the rapid pace of glacier melt. The Glacier Mass Balance Intercomparison Exercise (GlaMBIE) team (2025) reported that between 2000 and 2023, glaciers globally lost an average of 273 ± 16 gigatonnes of ice annually, with a 36% acceleration in the latter half of the period. This cumulative loss of 6,542 gigatonnes contributed approximately 18 millimeters to global sea-level rise. Projections by Rounce et al. (2023) suggest that even under a 1.5°C warming scenario, global glacier mass could decline by 26 ± 6% by 2100, increasing to 41 ± 11% under a 4°C scenario.

In the Hindu Kush Himalaya (HKH) region, these global trends are mirrored by local observations. Dr. Pradhananga highlighted that snowpacks are thinning, springs are drying, and rainfall is increasingly replacing snowfall. These changes threaten the freshwater supply for billions of people who depend on glacier-fed river systems.

Glacial lake outburst floods: a growing hazard

Glacier retreat contributes to the formation of unstable glacial lakes, increasing the risk of glacial lake outburst floods (GLOFs). A study published in Nature Communications estimated that 15 million people globally live under threat from GLOFs, with more than half residing in India, Pakistan, Peru, and China (Carrivick et al., 2023). The 2023 South Lhonak Lake disaster in Sikkim, India, exemplified this risk. A cloudburst triggered a GLOF that destroyed infrastructure and resulted in at least 14 fatalities and over 100 missing persons (NDTV, 2023). Although the lake had been previously identified as high-risk (Sattar et al., 2021), early warning systems were not fully operational at the time of the event. This incident illustrates the critical need for governance systems that can act on scientific data.

The climate finance gap

Despite the clear risks, mountain regions receive only 3% of global climate finance, and less than 1% of adaptation funding is allocated to glacier protection (Fakhruddin, 2025). The United Nations Environment Programme (UNEP, 2023) estimated the global adaptation finance gap at $194–366 billion annually. The lack of private capital in glacier adaptation is attributed to the absence of direct financial returns, despite the essential role glaciers play in water security, energy production, and food systems.

Financing models and the role of the Green Climate Fund

The Green Climate Fund (GCF) has pioneered innovative financing models to address these challenges. Its approach includes de-risking mechanisms, capital mobilization, bankable project structures, and tailored financing models such as pay-for-success and impact investing strategies (Fakhruddin, 2025). The diagram below illustrates the GCF’s financing framework.

 

Opportunities for Crowd-Sourced and Innovative Financing

Crowd-sourced financing presents a complementary avenue for glacier adaptation. Fakhruddin (2025) proposed models where private capital is tied to successful glacier restoration outcomes. These could include glacier bonds, community investment funds, and climate crowdfunding platforms. While such mechanisms cannot replace large-scale public and multilateral funding, they can raise awareness and engage broader constituencies.

Recommendations for the Decade of Cryospheric Sciences

The panel concluded with a set of actionable recommendations:

  • Expand interoperable glacier and hydrological monitoring networks, including community-based systems.
  • Integrate cryosphere data into national water security and disaster risk management frameworks.
  • Establish legally binding early warning systems for GLOFs and related hazards.
  • Increase adaptation finance for glacier and mountain regions through dedicated funding windows and blended finance.
  • Promote public-private partnerships and innovative financial instruments such as resilience bonds and glacier adaptation funds.
  • Build local capacity through training, community engagement, and integration of traditional knowledge.
  • Use the Decade of Cryospheric Sciences (2025–2034) to set measurable targets and track progress.

Conclusion

The science of glacier change is unequivocal. The challenge lies in aligning institutional action and financial flows with this knowledge. The Decade of Cryospheric Sciences offers a critical window to bridge this gap. Every degree of warming and every dollar invested will shape the future of the world’s glaciers and the communities that depend on them.

 

References

Carrivick, J. L., Tweed, F. S., et al. (2023). Fifteen million people at risk of glacial lake outburst floods. Nature Communications, 14, 487. https://doi.org/10.1038/s41467-023-36033-x

Fakhruddin, B. (2025, April 18). Saving the cryosphere requires innovative financing. Green Climate Fund. https://www.greenclimate.fund/insights/saving-cryosphere-requires-innovative-financing

GlaMBIE Team. (2025). Community estimate of global glacier mass changes from 2000 to 2023. Nature, 639, 382–388. https://doi.org/10.1038/s41586-024-08545-z

NDTV. (2023, October 5). 14 dead, 102 missing in Sikkim flash flood. https://www.ndtv.com/india-news/10-dead-82-missing-14-bridges-collapsed-in-sikkim-flash-flood-4450410

Rounce, D. R., Hock, R., Maussion, F., Hugonnet, R., Kochtitzky, W., Huss, M., Berthier, E., Brinkerhoff, D., Compagno, L., Copland, L., Farinotti, D., Menounos, B., & McNabb, R. W. (2023). Global glacier change in the 21st century: Every increase in temperature matters. Science, 379(6627), 78–83. https://doi.org/10.1126/science.abo1324

Sattar, A., Allen, S., Frey, H., Huggel, C., & Mergili, M. (2021). Modeling glacial lake outburst flood process chains in Sikkim Himalaya: Hazard assessment of two potentially dangerous lakes. EGU General Assembly 2021. https://doi.org/10.5194/egusphere-egu21-10838

United Nations Environment Programme. (2023). Adaptation Gap Report 2023: Underfinanced. Underprepared. https://www.unep.org/resources/adaptation-gap-report-2023

 

Understanding contemporary digital preservation practice: the EOSC EDEN project reports survey findings

By Laura Molloy, CODATA Research Lead

With rising threats to the existence of essential data resources, and mendacious contesting of the historical record, the current moment clearly demonstrates the critical role of high-quality digital preservation practitioners, skills and services. Digital preservation is a complex and diverse profession, often underfunded and sometimes misunderstood. It is important that we understand the current digital preservation landscape as well as possible, in order to support those working around the world in the preservation professions and to provide project outputs that will be of relevance and value to them. Accordingly, CODATA is delighted to be a participant in the European Open Science Cloud (EOSC) project, ‘Enhancing Digital preservation strategies at European and National level’ (EDEN).

EDEN has published the results of a survey which was recently conducted to gather information from the digital preservation community worldwide. Our survey was specifically about the guidance to which preservation practitioners refer, and practices they use, when identifying, selecting, and appraising digital data objects for ‘long-term’ preservation.

This blog post provides an informal overview of how we went about the survey, and what we discovered. We hope this will be of interest to those working in contemporary digital preservation, including managers, practitioners, and those responsible for policy making and training within memory organisations. If you would like further detail on any aspect of this work, the full report can be downloaded at https://doi.org/10.5281/zenodo.17984753.

About the EOSC EDEN project

The EOSC EDEN project, funded by the European Commission [1], seeks to enhance digital preservation strategies at European and national levels. The project is creating a framework to identify what data are candidates for digital preservation. This involves setting standards and protocols for long-term data preservation, which will be determined through an assessment of data usage, quality, and the data’s benefits to science and society.

In addition to the framework, the EOSC EDEN project aims to develop a model for re-appraisal of data throughout its lifecycle. The model for re-appraisal will support the framework for digital preservation by ensuring that preservation efforts remain relevant over time.

The survey activity was led by Laura Molloy, CODATA research lead, who is leading EDEN Task 1.1, ‘Landscape analysis of existing frameworks, guidelines and practices for identification, selection and appraisal of data for long-term preservation’. This task contributes the majority of the landscaping activity in the project. Laura is a qualitative social science researcher by training, with experience in a number of digital preservation projects and initiatives, and has a track record in research and consultancy relating to digital decision-making and information behaviours in varied professional settings. Analytical power was added by other members of the EDEN task team, including work package leader and digital preservation expert Micky Lindlar, and quantitive analyst Maria Benauer, both of Technische Informationsbibliothek (TIB).

Survey design

Understanding contemporary digital preservation includes direct contact with as many current practitioners as possible to understand their real practices—and the reference materials that inform those practices. We also need to build communication with those working in preservation across different types of organisation and in different countries. Accordingly, the EOSC EDEN 2025 survey was carefully designed to be simple to interact with, and to make sense to digital preservation professionals across organisation types, staff levels, geographical locations, and any or no discipline focus [2].

Survey questions were arranged into four main sections:

  1. About your organisation and role;

  2. About frameworks and guidelines for identification, appraisal and selection of data for long-term digital preservation;

  3. About current practices in identification, appraisal and selection of data for long-term digital preservation;

  4. Discipline-specific requirements for long-term preservation of digital objects.

The survey ended with one further short section gathering voluntary contact details, to enable the identification of candidates for any follow-up inquiry.

The questions were a mixture of closed and open questions, i.e. those that can be answered by choosing yes or no (closed questions) or those that require a more discursive, free-text answer to be generated by the respondent (open questions). Accordingly, a mixture of qualitative and quantitative analysis was performed by the task team.

Survey respondents

We received 250 valid responses from 31 states/nations [3]. The majority of responses were from Western Europe, followed by North America, despite focused activities undertaken by the task team to solicit a more evenly-distributed global response.

The size of respondents’ organisations was approximately evenly distributed across micro/small, medium, and large sized organisations [4], each with around a third of the responses. In terms of staff level within the organisation, around two-thirds of respondents were practitioners; just under a third of respondents identified as middle management and a few identified as senior management. We received responses from eighteen organisation types, which we coded into nine wider groupings called ‘organisation classes’, as follows: Academic publisher, Archive, Digital preservation service, Library, Multifunctional, Museum or gallery, Repository, Research performing organisation, Research infrastructure, plus Other/unassigned. The most populous class was ‘archive’ with 67 responses; the least populous was ‘academic publisher’ with one.

Selected findings

There are a few selected findings that were of interest to the task team, and offer some food for thought. These are briefly set out here.

‘Long-term’ preservation

Firstly, the project itself—as well as its subsidiary work packages and tasks—frequently uses the phrase, “long-term preservation”. We were interested to note that this emerges from the data as an unstable concept. One of the most striking findings was the high proportion of respondents who are working at an organisation where there is no agreed or working definition of ‘long-term’ in the digital preservation context. Even those respondents who did have an agreed or working definition of ‘long-term’ offered a wide range of numerical definitions of what that means for them and their preservation work.

Quality checking behaviours

We were interested in investigating two sets of quality checks: quality checks upon ingest and subsequent quality checks throughout the data preservation period. We asked various questions about if and how exactly these checks are carried out. We found that a majority of respondents do carry out quality checks of various kinds upon ingest but that this drops dramatically when we examine the occurrence of subsequent quality checks throughout the preservation period. This is a complex area for analysis and we would like to investigate more through some follow-up interviewing during 2026.

Commonalities with FAIR data

We note with interest the existing connections indicated by respondents, between the
digital preservation realm and the set of ideas currently designated ‘FAIR data’. These connections appeared in two different places in the survey responses.

First, respondents were asked about their usual preservation period; that is to say, the length of time that the organisation usually initially commits to holding and maintaining a preservation copy of a given data object. Here, respondents introduced a recurring—and pretty passionate—discussion about the importance of maintaining findability and access, whatever the agreed preservation period; and we noted that maintenance of findability and access was a much more important issue for many respondents than the existence of any shared agreement about the length of the preservation period.

Second, we provided respondents with a list of frameworks, standards and guidelines that we had gathered from desk research and professional experience. These were presented as likely reference resources for practitioners when they were working on identification, selection and appraisal of digital data objects in their day-to-day work. We asked respondents to indicate whether they were aware of each document and/or used it in their preservation work. The FAIR Guiding Principles was one of these documents. Respondents reported a high level of awareness and use of the FAIR principles (ranking 4th of 15 options). This reminds us that some of the ideas now encapsulated in the FAIR principles have been, to some extent, bedrocks of preservation practice for years, and suggests that digital preservation practitioners are aware of recent events in the FAIR data movement. (It is worth noting, however, that there is no similar visibility at this time of the TRUST or CARE principles within the responses from our participants.)

Needs of designated communities/threats to FAIRness of data over time

We asked a question about the extent to which the respondent understands any unmet needs of their organisation’s designated community [5]. Elsewhere in the survey, we also asked a question on the respondent’s view of threats to the ‘FAIRness’ of their preserved data over time. Some common themes emerged from the responses to these two questions. This suggests that these common themes may be issues of cross-cutting importance for the digital preservation practitioner community.

The most frequently highlighted issues here were: issues around sensitive / protected data; the challenges of data volume; and issues around access provision. Two of the top three designated community needs—data volume and access issues—recur in the top answers around threats to FAIR over time. Sensitive data issues were flagged in three responses, and the other designated community needs—long-term provision of service; lack of useful policy/directive; software preservation; provenance issues and various format problems—also all recur at low rates in the threats to FAIR over time. This is not particularly surprising as these are clearly frequently experienced challenges in the practice of preservation. But it is interesting to see that they are considered by respondents both from the perspective of directly meeting the needs of the community i.e. user-centred approaches, and also the arguably more theoretical perspective introduced when considering keeping digital data objects FAIR. Ultimately, though, FAIR data are data that meet user needs. It is a useful piece of validation that these themes recur in the responses to these two questions.

To conclude…

The EDEN task team is delighted by the response to the survey and thanks all participants.

Next steps within the task include some follow-up interviewing with consenting respondents to further explore the relationship between different information behaviours: for example, how quality checking is monitored; whether designated community needs are monitored and if so whether this impacts preservation activity; the role of data policy; and the role of organisational acquisition strategy. This work will be reported upon by the end of 2026.

In addition, certain findings from this enquiry are potentially useful for future work by CODATA, specifically the upcoming EU-funded project, ‘Developing and Implementing the Cross-Domain Interoperability Framework for EOSC’ (CDIF4EOSC), and the CODATA Task Group on Research Data Quality Management.

A full breakdown of data analysis and the findings we have heretofore identified is beyond the scope of this blog post, and can be found in the full report which is freely available online at https://doi.org/10.5281/zenodo.17984753. Any questions or feedback can be directed to the task leader at laura @ codata.org. For more information about the EOSC EDEN project please visit the project website, https://eden-fidelis.eu/.

 

[1] EDEN has received funding from the EU’s Horizon Europe research and innovation
programme under Grant Agreement no. 101188015.

[2] Although we note the use of English as the primary language of the survey may have been a limiting factor for some potential respondents.

[3] As defined by the United Nations member states available at the time of survey publication (May 2025).

[4] As defined by the European Commission.

[5] “Designated community” is defined in the EDEN Milestone 1.1 report (https://doi.org/10.5281/zenodo.16992452), based upon the OAIS definition (http://www.oais.info/), as: “A group of users, now or in the future, who can understand and use the Objects preserved. The designated community is whom the Objects are preserved for. It can be made of several user communities and the definition can change over time.”

Sustaining Research Data Capacity: Reflections from a CODATA Journey (2017–2025)

Felix Emeka Anyiam (Initial Co-Lead CODATA Connect 2019-2024)

In this post, Felix Emeka Anyiam, who was Initial Co-Lead of CODATA Connect, our Early Career Researcher initiative, from 2019-2024, reflects on his experiences over eight years of participating in CODATA activities.  In particular, he emphasizes the benefits of sustained collaborations and connections: “long-term, networked training matters more than one-off workshops” and praises the CODATA Connect and CODATA Data Schools model which allowed students to return in more responsible, leadership roles.  Felix’s story shows the CODATA Connect provided an environment and collaborations that benefited Felix in this journey. But it also shows how Felix’s open and generous character, his enthusiasm to participate, brought rewards. Please enjoy this uplifting story!  Simon HODSON, Executive Director, CODATA.

Welcome to Trieste: how it started, 2017

In August 2017, at the International Centre for Theoretical Physics (ICTP) in Trieste, Italy, I encountered research data science not merely as a set of analytical tools, but as a global public good. I arrived as a public health researcher from Nigeria, trained in epidemiology and biostatistics, seeking stronger quantitative approaches to interrogate health systems data. I left with something more enduring: an entry point into a global ecosystem shaped by CODATA’s commitment to open science, equity, and long-term capacity building.

The CODATA-RDA Research Data Science Summer School in Trieste offered more than technical instruction. It introduced a way of thinking about data, FAIR by design, ethically governed, and shared across disciplines and borders. Participants from low- and middle-income countries (LMICs) were not positioned as beneficiaries, but as peers and future contributors. CODATA functioned not as a sponsor, but as a convenor of people, ideas, and responsibility. That distinction would shape my professional trajectory in the years that followed.

Continuity as capacity: returning, deepening, expanding (2017–2018)

One year later, in August 2018, I returned to ICTP for the Climate Data Science Advanced Workshop, again under the CODATA-ICTP collaboration, with Clement Onime and Simon Hodson among the local organisers. This second invitation proved pivotal. It reinforced the idea that capacity building is most effective when it is iterative and cumulative, allowing participants to deepen expertise, cross disciplinary boundaries, and apply learning to new problem domains.

 

The 2018 programme expanded my analytical perspective beyond health to climate systems, environmental data, and computational modelling. Skills that later proved essential for interdisciplinary work at the intersection of climate, urban systems, and public health. More importantly, it signalled something fundamental about the CODATA model: participation was not episodic. There was an intentional pathway for return, growth, and contribution.

From Participant to Contributor: Teaching, Networks, and Leadership (2018–2025)

Following my initial training through the CODATA-RDA Research Data Science programmes in Trieste, the relationships established during those early years began to translate into sustained international collaboration. It was through these engagements that the foundations were laid for the first Urban Data Science Summer School in 2018, hosted by the Summer–Winter School at CEPT University, Ahmedabad, India, in collaboration with CEPT Faculty at the time, Dr Shaily Gandhi, marking an important expansion of CODATA-enabled capacity building beyond the initial training context. My role as a co-instructor extended this work to undergraduate and postgraduate cohorts.

Building on this momentum, the programme evolved into a more structured and geographically diverse initiative. The second edition of the Urban Data Science Summer School took place from 13 to 23 May 2019 (https://shailygandhi.github.io/UrbanDataScience2019/). These successive schools reflected not only the maturation of an academic programme, but also the strength of the collaborative networks that had emerged from CODATA’s training ecosystem, and networks sustained through shared curriculum development, co-teaching, and long-term professional exchange since those early connections in Trieste.

That same year, I was appointed inaugural co-lead of CODATA Connect, the organisation’s Early Career and Alumni Network, a role I held from 2019 to 2024 (https://codata.org/initiatives/data-skills/codata-connect/members/). CODATA Connect was established to address a persistent gap in global training initiatives: what happens after the workshop concludes. Rather than allowing capacity gains to dissipate, the network was designed as a continuity mechanism, enabling early-career researchers to remain engaged, visible, and supported within the wider CODATA ecosystem.

Working collaboratively with co-leads and core members from India, Costa Rica, Europe, Africa, Asia, Australia, and Latin America, CODATA Connect evolved into a distributed, peer-led platform for sustained skills development and exchange. Together, we coordinated a series of research skills webinars, thematic workshops, and podcast series that translated FAIR data principles, reproducibility, and ethical data stewardship into applied, domain-specific contexts. These activities included structured webinar series on research skills and reproducibility, smart and resilient cities, and open data practices, as well as hands-on technical workshops, such as training on distributed computing using Spark with R, explicitly targeted at early-career researchers in resource-constrained settings.

In parallel, CODATA Connect supported the development of cross-institutional podcast series, including Data for Resilient Cities, Data–Knowledge–Action for Urban Systems, Data for Disaster Risk Reduction, and Open GeoAI, which brought together researchers, practitioners, and policy actors to explore how open data, geospatial analytics, and AI can inform urban resilience, disaster risk reduction, health, and sustainable development. These initiatives not only expanded the reach of CODATA’s data-skills agenda but also created durable knowledge artefacts that continue to serve as learning resources beyond the immediate training context.

Throughout this period, my own contributions were embedded within this collective effort alongside colleagues such as Shaily R. Gandhi (Initial Lead-India), Mariana Cubero-Corella (Costa Rica), Anup Kumar Das (India), Neema Sumari (Tanzania), Kishore Sivakumar (Netherlands), Adenike Shonowo (Nigeria), Jacqueline Stephens (Australia), Jaime Rugeles (Colombia), Zhifang Tu (China), and others. We worked to ensure that CODATA Connect remained inclusive, interdisciplinary, and globally representative. The emphasis was consistently on peer mentorship, leadership development, and translation of open science principles into local research practice, particularly within low- and middle-income country contexts.

This trajectory reached a moment of continuity in August 2025, when I returned once again to ICTP, Trieste, this time not as a participant, but as a tutor and co-lead for the CODATA-RDA Advanced Workshop on Urban Data Science https://indico.ictp.it/event/10990).

Having first attended the CODATA-RDA programmes as a student in 2017 and 2018, returning as a facilitator underscored the iterative nature of CODATA’s capacity-building model. Alongside colleagues Dr Shaily Gandhi (ITU Linz, Austria) and Dr Neema Sumari (Sokoine University of Agriculture, Tanzania), I contributed to hands-on sessions on geospatial analytics for urban planning and policy, predictive modelling for population dynamics, infrastructure, and health-risk assessment, and decision-support systems for resilient and sustainable cities.

The 2025 workshop brought together researchers from multiple regions to deepen expertise in big-data analytics, computational infrastructure, urban and environmental data science, and ocean-science data, all grounded in FAIR principles and ethical data stewardship. Contributing to the same platform that had shaped my own formation in research data science reinforced a central lesson of this journey: effective capacity building is not a single intervention, but a networked process sustained through collaboration, continuity, and shared responsibility, where today’s participants become tomorrow’s instructors, mentors, and stewards of the global data ecosystem.

Broadening horizons: global exposure through CODATA-enabled opportunities

Alongside teaching and network leadership, CODATA-enabled pathways opened doors to broader global engagement. I was selected to participate in the International Training Workshop on Open Science and the SDGs hosted by the Chinese Academy of Sciences in Beijing in 2023, contributing to discussions on ethical data reuse and sustainable development. These collaborations produced the peer-reviewed article: Statements on Open Science for Sustainable Development Goals in the Data Science Journal, in which I was a co-author (https://doi.org/10.5334/dsj-2024-049). Earlier, I had been selected for Topics in Digital and Computational Demography at the Max Planck Institute for Demographic Research (Germany) and for the ALPSP Virtual Conference and Awards in the United Kingdom, one of only 20 global recipients.

Travel grants from CODATA supported participation in the ICTP Trieste programme (2018) and the Science for Development Workshop in South Africa (2020), underscoring CODATA’s practical commitment to inclusion. These experiences reinforced a consistent message: global capacity building is strongest when financial, intellectual, and institutional barriers are addressed together.

This period of sustained engagement and international collaboration was also marked by formal recognition from the wider scientific community. In 2025, I was inducted into Sigma Xi, The Scientific Research Honor Society, in recognition of my research contributions and commitment to advancing science in the public interest. While this honour is conferred independently, it reflects the cumulative impact of long-term investment in research training, open science practice, and global collaboration. The skills, networks, and values cultivated through CODATA’s capacity-building ecosystem were central to developing the kind of research profile and scholarly orientation that such recognition acknowledges.

SAIL 2025 as a milestone, not the destination

In 2025, I was invited to present at the Symposium on Artificial Intelligence for Learning Health Systems (SAIL 2025), co-hosted by Harvard Medical School and convened around a shared commitment to equity-driven, ethically grounded applications of artificial intelligence in healthcare. My presentation drew on doctoral research that applied machine-learning methods to examine inequities in HIV self-testing uptake across sub-Saharan Africa, using large-scale demographic health survey data from 24 countries (https://sail.health/event/sail-2025/program/).

The study employed Classification and Regression Tree (CART) and Random Forest models to identify socio-demographic predictors of willingness to self-test for HIV. Beyond methodological performance, the analysis foregrounded a persistent equity concern: rural populations, individuals with lower levels of education, and those in lower-income groups remain systematically underserved. The work demonstrated how predictive analytics, when designed transparently and interpreted responsibly, can inform targeted, community-embedded public health interventions rather than reinforce existing disparities.

What made participation in SAIL 2025 particularly significant, however, was not the event itself but the lineage that made meaningful engagement possible. The ability to work confidently across disciplinary boundaries, to interrogate data quality and representativeness, to foreground ethics and FAIR principles, and to communicate complex analytical approaches to diverse audiences was not acquired in isolation. These capacities were cultivated incrementally through long-term engagement with CODATA-led training programmes, teaching roles, and international peer networks.

Across plenary sessions, panels, and technical discussions at SAIL, a consistent message emerged: AI should not be framed as a luxury innovation for high-resource health systems, but as a practical, scalable tool for strengthening learning health systems where access, quality, and data infrastructure remain uneven. Conversations around AI-enabled clinical decision support in low- and middle-income countries, data governance for learning health systems, and patient-centred innovation resonated strongly with principles long emphasised within CODATA’s capacity-building ecosystem.

Several themes from the symposium were especially aligned with this trajectory. First, the centrality of context, that AI systems must be designed to work within real-world constraints rather than idealised data environments. Second, the discussions highlighted that data quality and equity cannot be treated separately: AI systems trained on incomplete, biased, or poorly governed datasets are likely to reinforce existing health disparities rather than mitigate them. Third, the importance of trust, transparency, and explainability, particularly when deploying models in sensitive or high-stakes health domains. Finally, there was a strong emphasis on collaboration over competition, underscoring the need for interdisciplinary and cross-sector partnerships to advance AI for health responsibly.

Seen through this lens, SAIL 2025 was not a destination, but a convergence point, where years of sustained capacity building translated into frontier research engagement. It affirmed that long-term investment in data skills, ethical reasoning, and global research networks enables researchers, particularly those working in LMIC contexts, to contribute meaningfully to shaping emerging conversations at the intersection of AI and health.

Rather than standing apart from earlier stages of training and collaboration, SAIL 2025 illustrated the cumulative effect of CODATA’s model: a pathway in which early exposure evolves into leadership, stewardship, and the application of advanced methods to questions of equity and public value.

From skills to stewardship: Governance and Responsibility

More recently, my engagement with CODATA has extended beyond training and programme delivery into data governance, interoperability, and infrastructure stewardship. I currently serve as a member of the Cross-Domain Interoperability Framework (CDIF) Working Group and Advisory Group, where I contribute to the development and review of interoperability standards, emerging CDIF profiles, and strategic oversight for globally connected data ecosystems. This work involves close collaboration with an international body of senior experts, as well as ongoing technical discussions focused on enabling responsible data reuse across domains.

In parallel, I serve as a reviewer for the Data Science Journal and have contributed to CODATA’s Smart Cities Task Group and the Resilient and Healthy Cities Working Group, with a particular focus on data-driven approaches to urban health, climate resilience, and risk reduction. These roles reflect an increasing emphasis on stewardship, helping to shape not only how data are analysed, but how they are governed, shared, and translated into public value within complex socio-technical systems.

This evolution from skills acquisition to systems-level responsibility has been further strengthened through formal engagement with public-sector digital governance. In December 2025, I completed the AI and Digital Transformation in Government programme delivered by Saïd Business School, University of Oxford, in collaboration with UNESCO. The programme offered a rigorous, practice-oriented exploration of how governments can responsibly harness artificial intelligence and data-driven technologies to deliver inclusive, ethical, and effective public services.

Key areas of focus included AI ethics and governance, human-centred service design, digital leadership, cyber resilience, and the management of systemic change within public institutions. Importantly, the programme foregrounded the role of evidence, accountability, and institutional capacity in ensuring that digital transformation serves citizens rather than exacerbates existing inequalities.

Taken together, these governance, editorial, and policy-oriented engagements reflect a central lesson of sustained capacity building: technical competence must ultimately be matched by institutional responsibility. The transition from learning how to use data to helping shape the frameworks that govern its use represents a critical step in ensuring that data science and AI contribute to equitable, trustworthy, and socially grounded outcomes at scale.

What this journey tells us about sustaining capacity

Several lessons emerge from this journey. First, long-term, networked training matters more than one-off workshops. Skills persist when they are reinforced through return, teaching, and community. Second, effective capacity building produces leaders and stewards, not just analysts. Third, continuity, supported by mentorship, alumni networks, and governance roles, is essential for translating training into durable impact, particularly in LMIC contexts.

Looking ahead

As data science and artificial intelligence increasingly shape global responses to health, climate, and development challenges, CODATA’s model offers a compelling blueprint. Capacity building is not an event; it is a commitment sustained over time. For early-career researchers, particularly those working in resource-constrained settings, CODATA continues to demonstrate what is possible when openness, equity, and continuity are placed at the centre of scientific practice.

Short Biography of the Author

Felix Emeka Anyiam is a public health researcher and data scientist based at the University of Port Harcourt, Nigeria. His work focuses on the ethical and equitable application of data science and artificial intelligence to health systems, urban resilience, and development challenges in low- and middle-income countries. An alumnus and long-term contributor to CODATA-led Research Data Science programmes, he has served as co-instructor in CODATA-RDA Advanced Workshops, inaugural co-lead of CODATA Connect (the Early Career and Alumni Network), and a member of multiple CODATA task and working groups. His research and teaching emphasise FAIR data principles, reproducibility, and responsible data governance within global and local research ecosystems.

February 2026: Publications in the Data Science Journal

Title: FIP Check: A Rubric-Based Tool for Assessing FAIR Implementation Profiles and Enabling Resources
Author: Sungha Kang, John Graybeal, Barbara Magagna, Erik Schultes, Nancy Hoebelheinrich, Chris Erdmann, Ismael Kherroubi Garcia, Julianne Christopher, Christine R. Kirkpatrick
URL: http://doi.org/10.5334/dsj-2026-008
Title: Development of Technology Convergence Assessment Framework for Poly crisis
Author: Rania Elsayed Ibrahim, Tshiamo Motshegwa, Abdelaziz Elfadaly, Alaa A. Elbiomy, Mai Ramadan Ibraheem
URL: http://doi.org/10.5334/dsj-2026-007
Title: Bridging the Data Discovery Gap: User-Centric Recommendations for Research Data Repositories
Author: Mingfang Wu, Felicitas Löffler, Brigitte Mathiak, Fotis Psomopoulos, Uwe Schindler, Amir Aryani, Jordi Bodera Sempere, Antica Culina, Andreas Czerniak, Chris Erdmann, Kathleen Gregory, Nick Juty, Allyson Lister, Ying-Hsang Liu, Samantha Pearman-Kanza
URL: http://doi.org/10.5334/dsj-2026-006
Title: Essential Aspects of Tools for Developing Scientific Data Management Plans
Author: Fabiano Couto Corrêa Silva, Sandra de Albuquerque Siebra, Laura Vilela Rodrigues Rezende, Alexandre Faria de Oliveira, Denise Oliveira de Araújo
URL: http://doi.org/10.5334/dsj-2026-005
Title: Data Management in a Community-Based Birth Cohort: What the SEMILLA Study Teaches Us
Author: Nataly Cadena, Fadya Orozco, Stephanie Montenegro, Fabián Muñoz, Alexis J. Handal
URL: http://doi.org/10.5334/dsj-2026-004
Title: Implementing the FAIR and CARE Principles Simultaneously: Emerging Insights from IPBES
Author:  Renske M. Gudde, Rainer M. Krug, Yanina V. Sica, Howard P. Nelson, Félicie Françoise, Manuela Gómez-Suárez, Aidin Niamir
URL: http://doi.org/10.5334/dsj-2026-003