openwashdata

a community effort to bring open data practices to the WASH sector

Lars Schöbitz

Global Health Engineering, ETH Zurich

September 12, 2024

The Opportunity

  • We have a huge and missed opportunity in our sector
  • Very little data is shared publicly and following best practices for reuse

Journal articles

Data: R package washopenresearch to be published at https://github.com/openwashdata/washopenresearch

  • The first missed opportunity are journal articles and data from researchers

  • We looked at the Data Availability Statements in 924 articles published in the Journal of Water, Sanitation and Hygiene for Development from 2011 to 2023.

  • You can see the data availability statements on the vertical axis and the number of publications on the horizontal axis

  • Colors differentiate between papers published before 2020 and in 2020 or later, when a policy was introduced that requires authors to select one of the three data availability statements

  • After that policy was introduced, we still found 15% of papers without a data availability statement, while 60% of articles stated that data was available in the paper, which could also be as supplementary material

Journal articles

Supplementary Material

Take-away: Not a single file is in machine-readable, non-proprietary file type format that would qualify for following FAIR principles for data sharing (Wilkinson et al. 2016).

Good practice: CSV file (comma-separated values), including a data dictionary for all variables/columns in the data

Supplementary Material
Articles published 2020 or later
file type n1 %
missing 202 51.4
docx 149 37.9
xlsx 24 6.1
pdf 13 3.3
pptx 4 1.0
png 1 0.3
1 One article can have multiple files.

Data: R package washopenresearch to be published at https://github.com/openwashdata/washopenresearch

  • We then looked at the Supplementary Material of all articles published in 2020 or later and found that have half of the published articles still had no data published alongside the article

  • But, the most insightful take-away is that not a single file was shared in a file type format that would qualify for following FAIR principles for data sharing.

  • That is something we are hoping to change, where sharing data as CSV files would already go a long way.

PDF reports

Screenshot from Soeters, Mukheibir, and Willetts (2021)

Another data sink are PDF reports. We love them. They are everywhere. They are typically designed by a graphic designer that receives content from us.

PDF reports

Screenshot from Soeters, Mukheibir, and Willetts (2021)

  • Unfortunately, these valueble reports never come with the underlying unprocessed raw data attached. In this particular report, we could get access to treatment performance data of eight faecal sludge treatment plants across Africa and Asia

  • What we get are tables of ranges instead of the complete number of data points that was collected.

openwashdata community

  • To address this, we started the openwashdata community

openwashdata community

Vision

An active global community that applies FAIR principles (Wilkinson et al. 2016) to data generated in the greater water, sanitation, and hygiene sector.

Mission

Empower WASH professionals to engage with tools and workflows for open data and code.

From: openwashdata.org/pages/gallery/vmost/

openwashdata publishing

We are doing this through our openwashdata publishing arm, for which we developed a workflow that uses data donated by WASH professionals or available online, and re-publishes it following FAIR principles for data sharing

openwashdata.github.io/fsmglobal/

fsmglobal documenation website by Greene et al. (2023) built with pkgdown R package

  • The product is an R package published as a website for each dataset. We assign a digital object identifier that enables the tracking of citations of the data package and list all contributors with their ORCID iD, so that contributions and citations are linked.

  • Data is documented in a way where all variables/columns are described in detail.

  • And for those who do not use R, we also share the data as a CSV and XLSX file.

  • To date, we have published 12 datasets following this workflow.

openwashdata academy

  • We also established the openwashdata academy through which we provide training to anyone interested in the greater WASH sector.

data science for openwashdata 001

  • free, live, online, 10-week programme
  • 200 registrations
  • 100 show-ups
  • 40 graduates
  • next iteration: September/October 2024, sign-up: https://forms.gle/MP5rNYZagBdfG2ZRA

ds4owd-001.github.io/website/

what’s coming

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Data stewardship (openwashdata phase II)

hands up

Who has an ORCID iD?

hands up

Who has published a scientific article in a journal?

Meet a data steward

I have:

  • 10+ years work experience (5 in research)
  • empathy, compassion, patience, persistance
  • an affinity for IT
  • teaching experience

I don’t have:

  • a doctoral degree
  • a qualification in computer science
  • a qualification in statistics
  • a lot of time

Job Description: Open Science Specialist

My role - Open Science Specialist

  • research data management
  • reproducible workflows
  • mindset for Open Science
  • research communication
  • teaching data science tools
  • proposal writing

Your turn: Think & Note

For 1 minute, think about these two questions and take some notes for later:

  1. How should I be rewarded scientifically?

  2. Which career paths are there for data stewards?

Research Data Management

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package
final data underlying a publication data that is the result of an analysis (e.g descriptive statistics or data visualization) and shown in a publication, but then also exported in its new form as a new file CSV

Self-nomination for Swiss Reproducibility Award 2024: https://ghe-open.ch/blog/posts/2024-02-27-swissrn-award/

Data Management Strategy

Diagram at: https://github.com/Global-Health-Engineering/concept-maps

Data steward for WASH R&D Center

  • A fully funded 2-year position, hopefully extended to 5-years with 3rd party funding
  • Job announced soon (2024-10-25: due date for submission of application)
  • Start date 15th January 2024 (to be discussed)
  • Going through a 12-month programme together with data steward at NGO BASEflow in Malawi

Data steward activities (WP1)

  • Activity 1.3: Identify how ethical approval for data collection differs for types of organizations (university, NGO) and types of data (quantitative, qualitative).

  • Activity 1.4: Identify current data management practices and develop a draft data management strategy for organization.

  • Activity 1.5: Publish at least 10 datasets of two different types that are available to the organization, following openwashdata data publishing workflow.

Hands-on workshop (end Oct / beginning Nov)

A the end of the workshop, participants will be able to:

  1. Describe how data published using the washr package follows FAIR principles compared to data shared in an appendix of a PDF or DOCX document.

  2. Follow step by step instruction to create an R data package using the washr package.

  3. Understand the difference between human-readible and machine-readible documentation.

News

Support us: Sign up to our newsletter

Scan me!

https://buttondown.email/openwashdata



Thanks 🌻

This project was supported by the Open Research Data Program of the ETH Board.

The slides were created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/

You can view source code of slides on GitHub

Or you can download slides in PDF format

This material is licensed under Creative Commons Attribution Share Alike 4.0 International.

References

Greene, Nicola, Sarah Hennessy, Tate W. Rogers, Jocelyn Tsai, Francis L. de los Reyes III, and Lars Schöbitz. 2023. “Fsmglobal. Global Faecal Sludge Emptying Services Demand.” https://doi.org/10.5281/zenodo.8208293.
Soeters, S, P Mukheibir, and J Willetts. 2021. “Treatment Technologies in Practice: On-the-Ground Experiences of Faecal Sludge and Wastewater Treatment.”
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1). https://doi.org/10.1038/sdata.2016.18.

openwashdata.org/pages/gallery/slides/

1 / 54
openwashdata a community effort to bring open data practices to the WASH sector Lars Schöbitz Global Health Engineering, ETH Zurich September 12, 2024

  1. Slides

  2. Tools

  3. Close
  • openwashdata
  • The Opportunity
  • Journal articles
  • Journal articles
  • PDF reports
  • PDF reports
  • openwashdata community
  • openwashdata community
  • openwashdata publishing
  • openwashdata.github.io/fsmglobal/...
  • openwashdata academy
  • data science for openwashdata 001
  • what’s coming
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Data stewardship (openwashdata phase II)
  • hands up
  • hands up
  • Meet a data steward
  • Your turn: Think & Note
  • Research Data Management
  • Three terms for three stages
  • Three terms for three stages
  • Three terms for three stages
  • Data Management Strategy
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Diagram at: https://github.com/Global-Health-Engineering/concept-maps...
  • Data steward for WASH R&D Center
  • Data steward activities (WP1)
  • Hands-on workshop (end Oct / beginning Nov)
  • News
  • Support us: Sign up to our newsletter
  • Thanks 🌻
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help