Publish with Ease
A Low-Cost, Open-Source Solution for Research Data Sharing
October 9, 2024
Introduction
- Data Steward @ Global Health Engineering (GHE), ETH Zürich
- “Addressing the determinants of health as a function of engineered interventions and systems”.
- One major component of health is WASH – water, sanitation and hygiene
WASH practitioners often lack skills in computational data management
openwashdata
- Established in 2021, applying FAIR principles (Wilkinson et al. 2016) to data generated in the WASH sector
- Empower WASH professionals to engage with tools and workflows for open data and code
- Support other organizations with their data management, data events (e.g., hackathon), free data science course
- Core team: Two data stewards and one intern at GHE; many collaborators
openwashdata academy
- 10-week free data science course to empower WASH professionals to engage with tools and workflows for open data
- 200 registrations from 46 countries
- 27 datasets submitted as final projects
How can we streamline the data publishing procedure?
washr
- An R package designed to simplify WASH data publishing
- User-friendly functions to ensure that data adheres to FAIR principles
- Easy to use, with a detailed guide and workflow visualization
So far:
- Almost a dozen datasets published
- Requires minimal computational power
- Easily generalizable to benefit the wider community
Preparing the data
- Start a local version-controlled folder, connect it to GitHub
Preparing the data
setup_rawdata()
- Creates
data-raw
as suggested in usethis
R Package1
- Creates
data_processing.R
for data cleaning
Documenting the data
- Create roxygen skeletons
- Create README
- Codebook describing each variable
- Website with
pkgdown
R package1
- Add a license and author(s)
FAIR principles from a data perspective
- Findable by having an ORCID
- Accessible by utilizing open-source software for both preparing and publishing data
- Interoperable by exporting harmonized data as tidy data
- Reproducible by ensuring transparency in the data cleaning process (see
data-raw/data-processing.R
)
FAIR principles from a user perspective
- Findable by publishing tools on accessible platforms (e.g., CRAN, group wiki)
- Accessible by thoroughly documenting functions and use cases
- Interoperable by maximizing generalizability
- Reproducible by consistently updating and revising
What now?
- Continue expanding
washr
R package (more functions, tests)
- Publish
washr
on CRAN to make it more accessible
- Create meta package to easily download data published on openwashdata.org
- Enhance the guide for building and preparing data packages using
washr
- Expand the package’s functionality to be applicable in diverse contexts
Sign up for the openwashdata newsletter!