openwashdata

a community effort to bring open data practices to the WASH sector

Lars Schöbitz

Global Health Engineering, ETH Zurich

April 4, 2024

The Opportunity

  • We have a huge and missed opportunity in our sector
  • Very little data is shared publicly and following best practices for reuse

Journal articles

Data: R package washopenresearch to be published at https://github.com/openwashdata/washopenresearch

  • The first missed opportunity are journal articles and data from researchers

  • We looked at the Data Availability Statements in 924 articles published in the Journal of Water, Sanitation and Hygiene for Development from 2011 to 2023.

  • You can see the data availability statements on the vertical axis and the number of publications on the horizontal axis

  • Colors differentiate between papers published before 2020 and in 2020 or later, when a policy was introduced that requires authors to select one of the three data availability statements

  • After that policy was introduced, we still found 15% of papers without a data availability statement, while 60% of articles stated that data was available in the paper, which could also be as supplementary material

Journal articles

Supplementary Material

Take-away: Not a single file is in machine-readable, non-proprietary file type format that would qualify for following FAIR principles for data sharing (Wilkinson et al. 2016).

Good practice: CSV file (comma-separated values), including a data dictionary for all variables/columns in the data

Supplementary Material
Articles published 2020 or later
file type n1 %
missing 202 51.4
docx 149 37.9
xlsx 24 6.1
pdf 13 3.3
pptx 4 1.0
png 1 0.3
1 One article can have multiple files.

Data: R package washopenresearch to be published at https://github.com/openwashdata/washopenresearch

  • We then looked at the Supplementary Material of all articles published in 2020 or later and found that have half of the published articles still had no data published alongside the article

  • But, the most insightful take-away is that not a single file was shared in a file type format that would qualify for following FAIR principles for data sharing.

  • That is something we are hoping to change, where sharing data as CSV files would already go a long way.

PDF reports

Screenshot from Soeters, Mukheibir, and Willetts (2021)

Another data sink are PDF reports. We love them. They are everywhere. They are typically designed by a graphic designer that receives content from us.

PDF reports

Screenshot from Soeters, Mukheibir, and Willetts (2021)

  • Unfortunately, these valueble reports never come with the underlying unprocessed raw data attached. In this particular report, we could get access to treatment performance data of eight faecal sludge treatment plants across Africa and Asia

  • What we get are tables of ranges instead of the complete number of data points that was collected.

openwashdata community

  • To address this, we started the openwashdata community

openwashdata community

Vision

An active global community that applies FAIR principles (Wilkinson et al. 2016) to data generated in the greater water, sanitation, and hygiene sector.

Mission

Empower WASH professionals to engage with tools and workflows for open data and code.

From: openwashdata.org/pages/gallery/vmost/

openwashdata publishing

We are doing this through our openwashdata publishing arm, for which we developed a workflow that uses data donated by WASH professionals or available online, and re-publishes it following FAIR principles for data sharing

openwashdata.github.io/fsmglobal/

fsmglobal documenation website by Greene et al. (2023) built with pkgdown R package

  • The product is an R package published as a website for each dataset. We assign a digital object identifier that enables the tracking of citations of the data package and list all contributors with their ORCID iD, so that contributions and citations are linked.

  • Data is documented in a way where all variables/columns are described in detail.

  • And for those who do not use R, we also share the data as a CSV and XLSX file.

  • To date, we have published 12 datasets following this workflow.

openwashdata academy

  • We also established the openwashdata academy through which we provide training to anyone interested in the greater WASH sector.

data science for openwashdata 001

  • free, live, online, 10-week programme
  • 200 registrations
  • 100 show-ups
  • 40 graduates
  • next iteration: September/October 2024, sign-up: https://forms.gle/MP5rNYZagBdfG2ZRA

ds4owd-001.github.io/website/

what’s next

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

Read full proposal for Phase 2 at: openwashdata.org/pages/gallery/proposal-02/

News

Sign up to our newsletter

Scan me!

https://buttondown.email/openwashdata



Thanks 🌻

This project was supported by the Open Research Data Program of the ETH Board.

The slides were created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/

You can view source code of slides on GitHub

Or you can download slides in PDF format

This material is licensed under Creative Commons Attribution Share Alike 4.0 International.

References

Greene, Nicola, Sarah Hennessy, Tate W. Rogers, Jocelyn Tsai, Francis L. de los Reyes III, and Lars Schöbitz. 2023. “Fsmglobal. Global Faecal Sludge Emptying Services Demand.” https://doi.org/10.5281/zenodo.8208293.
Soeters, S, P Mukheibir, and J Willetts. 2021. “Treatment Technologies in Practice: On-the-Ground Experiences of Faecal Sludge and Wastewater Treatment.”
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1). https://doi.org/10.1038/sdata.2016.18.

openwashdata.org/pages/gallery/slides/

1 / 27
openwashdata a community effort to bring open data practices to the WASH sector Lars Schöbitz Global Health Engineering, ETH Zurich April 4, 2024

  1. Slides

  2. Tools

  3. Close
  • openwashdata
  • The Opportunity
  • Journal articles
  • Journal articles
  • PDF reports
  • PDF reports
  • openwashdata community
  • openwashdata community
  • openwashdata publishing
  • openwashdata.github.io/fsmglobal/...
  • openwashdata academy
  • data science for openwashdata 001
  • what’s next
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • Read full proposal...
  • News
  • Sign up to our newsletter
  • Thanks 🌻
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help