openwashdata Data Package Hackathon

This hackathon is organized by Mian Zhong from the Global Health Engineering group at ETH Zurich. Participants will make an openwashdata R data package with the washr package developed by openwashdata.

๐ŸŽฏ Objectives

  • Deliver publishable openwashdata data packages
  • Beta-test the washr R package to receive feedback for the package release
  • Meet openwashdata friends and have fun

๐ŸŒŸ Showcase

Here showcase the data packages developed from our hackathon. These datasets cover WASH data about Malawi ๐Ÿ‡ฒ๐Ÿ‡ผ, Uganda ๐Ÿ‡บ๐Ÿ‡ฌ, Brazil ๐Ÿ‡ง๐Ÿ‡ท, Peru ๐Ÿ‡ต๐Ÿ‡ช, and Ghana. A big shoutout to all the participants for their hard work and dedication!

barplot from boreholefuncmwi data package

boreholefuncmwi

Data about the survey on borehole functionality in Karonga district, Malawi.

Go to Dataset

boxplot about the portawaterperu data package

portawaterperu

Data about a preliminary review of the portable water system in Peru.

Go to Dataset

Image 3

ugabore

Data about borehole repair collected from two districts in central Uganda.

Go to Dataset

Image 3

waschoolpiracema

Data about water supply, sewage disposal, waste collection and sanitary equipment of the schools in Piracema, Brazil.

Go to Dataset

Image 2

watercostaccra

Data about the surveys on household water costs, coping mechanisms, and water point estimates in Accra, Ghana.

Go to Dataset

๐Ÿ“† Event Details

  • Date: Friday, June 21, 9:00 AM - 4:30 PM (CET)

  • Location: Location: Zurich, Switzerland

    openwashdata hackathon participants

๐Ÿ“ Agenda

Time Title Remark
08:30 - 09:00 Check in & Breakfast
09:00 - 09:05 Opening by Mian Zhong
09:05 - 09:50 Introduction workshop for washr R package by Mian Zhong
10:00 - 12:00 Coding Package Setup & Data Cleaning
12:00 - 12:40 Lunch
12:45 - 14:45 Coding Data Cleaning & README writing
14:45 - 15:00 Break / Stretch
15:00 - 15:30 Final Polish Write Feedback Survey
15:30 - 16:15 Showcase
16:25 - 16:30 Closing

๐Ÿ–ผ๏ธ Slides

View slides in full screen ๏ฝœ Download slides as PDF

๐Ÿง‘โ€๐Ÿ’ป Workflow

Initialize package repository

  1. Open GitHub

  2. Open RStudio IDE

  3. Create a new project following:

      • Scroll down, this option is usually at bottom

  4. Configure Git version control on your local computer.

    usethis::use_git_config(user.name = "Your Name", user.email = "Your GitHub Email")
  5. Add git version control to local directory

    • yes, commit

    • yes, restart

  6. Connect local computer with GitHub, please refer to https://happygitwithr.com/https-pat.html for more details.

    • usethis::create_github_token()
  7. Open Terminal (the tab next to Console), run commands suggested in your assigned GitHub repository. The commands should look like the following:

    # Change the url link to be your assigned GitHub repo!
    
    git remote add origin "https://github.com/openwashdata/fssample.git" 
    
    git branch -M main
    
    git push -u origin main
    • You can find and copy the commands in your assigned repo too:

    • If this is your FIRST TIME connecting RStudio with GitHub, you will receive a message in the Terminal to log in. Enter your GitHub username, and for the password, copy-and-paste the PAT token generated from Step 7 (ghp_xxxxxxxxxxxxxxxxxxxxxx)

  8. Install washr R package:

    library(devtools)
    devtools::install_github("openwashdata-dev/washr")

    You might encounter a message to update packages, choose the option โ€œAllโ€ to continue.

  9. Load required libraries:

    • library(devtools)

    • library(usethis)

    • library(washr)

Create Dataset

Data Processing

  1. Add directory for raw data to project
    • In Console, execute setup_rawdata()
  2. Move raw data files to the directory data-raw
    • Add, commit and push all changes to GitHub

      • Select โ€œGitโ€ tab on the top-right panel
      • (Click โ€œPullโ€ first for good practice)
      • Tick all files and click Commit
      • Enter a commit message and click โ€œpushโ€

  3. Work on data-raw/data_processing.R to clean raw data and export tidy data.
    • You may need to modify or delete some code in data_processing.R
  4. Export the tidy data by executing the whole data_processing.R
    • Add, commit and push all changes to GitHub

Dictionary

  1. Once data reaches tidy state, in console, execute setup_dictionary()
    • Go to data-raw/dictionary.csv
    • Fill the column โ€œdescriptionโ€ in dictionary.csv for each dataset and variable
      • It might be easier to edit in a spreadsheet software (e.g. Excel)
    • Save dictionary.csv
  2. Add, commit and push all changes to GitHub:

Document Dataset

Roxygen

  1. Initiate and write documentation in R/ folder by executing in console: setup_roxygen()
    • Open each documentation file in R/ to write a human-readable title and description about the dataset

  2. Add, commit and push all changes to GitHub
  3. Use devtools to document, check and install the package
    • devtools::document()

    • devtools::check()

    • devtools::install()

      If there is any error or warning, please let me know and we can look together. You will have a warning message about license which will be addressed in the next step.

DESCRIPTION

  1. Add yourself as the creator and author of the package
use_author(given = "First Name", family = "Last Name", 
           role = c("aut", "cre"), email = "Your email",
           comment = c(ORCID = "XXXX-XXXX-XXXX-XXXX"))
  1. On GitHub, create an issue with details to write up author information for DESCRIPTION file

    • Contributors (name, email, role, ORCID)
      • Include everyone here
        • Roles
          • cre = maintainer
          • aut = significant contributions
          • ctb = contributor with smaller contributions
    • Add other author(s):
    use_author(given = "Second Author", family = "Second Author", role = "aut")
  2. Go to DESCRIPTION file, write the Title and Description about the package. Then, in console, run: update_description() to update other fields. Proof-read the DESCRIPTION file to make sure that the fields are correct.

  3. Use devtools to document, check and install the package

    • devtools::document()

    • devtools::check()

    • devtools::install()

      If there is any error or warning, please let me know and we can look together.

Communicate Dataset

README

  1. In console, execute setup_readme().
    • If you are not tight on time, optionally, enable has_example=TRUE to create an example article for the package.
  2. Open README.Rmd and edit the sections.
    • Make at least one plot about the data in the section โ€œExampleโ€
  3. Once you finish writing README.Rmd, run build_readme().
  4. Add, commit and push all changes to GitHub

Pkgdown Website

  1. In console, run setup_website() to create an openwashdata style pkgdown website
    • Select โ€œNoโ€ option to not override the _pkgdown.yml in the console

  2. Use devtools to document, check and install the package
    • devtools::document()

    • devtools::check()

    • devtools::install()

      If there is any error or warning, please let me know and we can look together.

  3. Open .gitignore, remove docs, and save the file.
  4. Add, commit and push all changes to GitHub