Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions ..Rcheck/00check.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
* using log directory ‘/Users/lschoebitz/Documents/gitrepos/gh-org-openwashdata/data-repos/gdho/..Rcheck’
* using R version 4.5.0 (2025-04-11)
* using platform: aarch64-apple-darwin20
* R was compiled by
Apple clang version 14.0.0 (clang-1400.0.29.202)
GNU Fortran (GCC) 14.2.0
* running under: macOS Sequoia 15.5
* using session charset: UTF-8
* checking for file ‘./DESCRIPTION’ ... ERROR
Benötigte Felder fehlen oder sind leer:
‘Author’ ‘Maintainer’
* DONE
Status: 1 ERROR
199 changes: 199 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# CLAUDE.md - OpenWashData R Package Review Guide

This guide helps Claude Code review R data packages for the openwashdata organization, ensuring consistency, quality, and completeness across all published datasets.

## Overview

The review process follows a PLAN → CREATE → TEST → DEPLOY workflow triggered by a PR from dev to main branch. Each phase requires explicit user approval before proceeding.

## Review Workflow

### 1. PLAN Phase

When initiated via `/review-package [package-name]`, Claude will:

1. **Analyze Package Structure**
- Verify package was created with `washr` template
- Check for required directories: R/, data/, data-raw/, inst/extdata/, man/
- Confirm presence of key files: DESCRIPTION, README.Rmd, _pkgdown.yml

2. **Create Review Issues** (5 GitHub issues)
- Issue 1: General Information & Metadata
- Issue 2: Data Content & Quality
- Issue 3: Data Processing Script Review
- Issue 4: Documentation
- Issue 5: Tests & CI/CD

3. **Present Review Plan**
- Summary of findings
- List of issues to be addressed
- Request user confirmation before proceeding

### 2. CREATE Phase

After user approval, work through each issue systematically:

#### Issue 1: General Information & Metadata
- [ ] DESCRIPTION file completeness
- Title (descriptive, <65 characters)
- Description (clear purpose statement)
- Authors with ORCID IDs
- License: CC BY 4.0
- Dependencies properly declared
- Version follows semantic versioning
- [ ] CITATION.cff file present and valid
- [ ] Generate citation using `washr::compile_citation()`

#### Issue 2: Data Content & Quality
- [ ] Data files in data/ directory (.rda format)
- [ ] CSV/XLSX exports in inst/extdata/
- [ ] Main dataset accessible via function matching package name
- [ ] Data quality checks:
- No unexpected missing values
- Consistent data types
- Reasonable value ranges
- Proper encoding (UTF-8)

#### Issue 3: Data Processing Script Review
- [ ] data_processing.R in data-raw/
- [ ] Script is reproducible and well-commented
- [ ] Raw data files preserved in data-raw/
- [ ] dictionary.csv with variable descriptions
- [ ] Uses tidyverse conventions
- [ ] Handles data cleaning transparently

#### Issue 4: Documentation
- [ ] README.Rmd follows openwashdata template:
- Dynamic content generation
- Installation instructions
- Data overview with dimensions
- Variable dictionary table
- License and citation sections
- [ ] Roxygen documentation for all exported functions
- [ ] _pkgdown.yml configured with:
```yaml
template:
bootstrap: 5
includes:
in_header: |
<script defer data-domain="openwashdata.github.io" src="https://plausible.io/js/script.js"></script>
```
- [ ] Package website builds without errors

#### Issue 5: Tests & CI/CD
- [ ] GitHub Actions workflow for R-CMD-check
- [ ] Package passes `devtools::check()` with no errors/warnings
- [ ] Examples run successfully
- [ ] Data loads correctly

**For each issue**: Present planned changes and request user confirmation before implementing.

### 3. TEST Phase

Run comprehensive package checks:
```r
devtools::check()
devtools::build()
pkgdown::build_site()
```

Verify:
- All tests pass
- No R CMD check issues
- Documentation renders correctly
- Website builds successfully

### 4. DEPLOY Phase

1. Build and deploy pkgdown website
2. Verify Plausible analytics tracking
3. Confirm all changes are committed
4. Approve PR merge to main branch

## Key Standards

### Required Files Structure
```
package-name/
├── DESCRIPTION
├── NAMESPACE
├── R/
│ └── package-name.R
├── data/
│ └── package-name.rda
├── data-raw/
│ ├── data_processing.R
│ └── dictionary.csv
├── inst/
│ ├── CITATION
│ └── extdata/
│ ├── package-name.csv
│ └── package-name.xlsx
├── man/
├── README.Rmd
├── README.md
├── CITATION.cff
├── _pkgdown.yml
└── .github/
└── workflows/
└── R-CMD-check.yaml
```

### Package Dependencies
Common dependencies for data packages:
- dplyr, tidyr (data manipulation)
- readr, readxl (data import)
- janitor (data cleaning)
- desc (DESCRIPTION parsing)
- gt, kableExtra (table formatting)

### Quality Criteria
1. **Reproducibility**: All data processing steps documented and runnable
2. **Transparency**: Raw data preserved with clear transformation pipeline
3. **Accessibility**: Multiple export formats (R, CSV, XLSX)
4. **Documentation**: Comprehensive variable descriptions and usage examples
5. **Consistency**: Follows openwashdata naming and structure conventions

## Commands

- `/review-package [package-name]` - Start package review
- `/review-status` - Check current review progress
- `/review-issue [number]` - Work on specific issue
- `/review-pr` - Create pull request for current issue

## Important Notes

- Always request user confirmation between phases
- Check in with user before implementing changes in CREATE phase
- Preserve existing git history and commits
- Follow tidyverse style guide for R code
- Use semantic versioning for package versions

## Project Management with GitHub CLI

- List issues: `gh issue list`
- View issue details: `gh issue view 80` (e.g., for issue #80 "Rename geographies parameter")
- Create branch for issue: `gh issue develop 80`
- Checkout branch: `git checkout 80-rename-geographies-parameter-to-entities`
- Create pull request: `gh pr create --title "Rename geographies parameter to entities" --body "Implements #80"`
- List pull requests: `gh pr list`
- View pull request: `gh pr view PR_NUMBER`

## Build/Test/Check Commands

- Build package: `R CMD build .`
- Install package: `R CMD INSTALL .`
- Run all tests: `R -e "devtools::test()"`
- Run single test: `R -e "devtools::test_file('tests/testthat/test-FILE_NAME.R', reporter = 'progress')"`
- Run R CMD check: `R -e "devtools::check()"`
- Build Roxygen2 documentation: `R -e "devtools::document()"`
- Build vignettes: `R -e "devtools::build_vignettes()"`
- Build README.md from README.Rmd: `R -e "devtools::build_readme()"`

## Code Style Guidelines

- Use 2 spaces for indentation (no tabs)
- Maximum 80 characters per line
- Use tidyverse style for R code (`dplyr`, `tidyr`, `purrr`)
- Use snake_case for function and variable names

7 changes: 5 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,12 @@ Description: A dataset of global humanitarian organizations collected by Humanit
License: CC BY 4.0
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Depends:
R (>= 2.10)
R (>= 3.5)
LazyData: true
Config/Needs/website: rmarkdown
Date: 2024-02-29
Suggests:
testthat (>= 3.0.0)
Config/testthat/edition: 3
5 changes: 4 additions & 1 deletion R/gdho.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#' excludes the information about how many operational units a humanitarian organization has
#' in each country, where you can find it in data gdho_full.
#'
#' @format A tibble with 4556 rows and 33 variables
#' @format A tibble with 4548 rows and 35 variables
#'
#' \describe{
#' \item{id}{A unique Id for each organisation}
Expand Down Expand Up @@ -38,10 +38,13 @@
#' \item{ope/staff}{Percent of operational program expenditure per staff member}
#' \item{ope_inflation_adjusted}{Operational program expenditure adjusted for inflation}
#' \item{ope_original_currency}{Actual approximate operational program expenditure in original currency used by organisation}
#' \item{ope_original_amount}{Operational program expenditure amount in original currency}
#' \item{ope_original_currency_code}{Currency code for operational program expenditure}
#' \item{humexp_approx_usd}{Approximate humanitarian expenditure in USD}
#' \item{humexp_imputed}{Imputed approximate humanitarian expenditure in USD}
#' \item{humexp_inflation_adjusted}{Approximate humanitarian expenditure adjusted for inflation}
#' }
#'
#' @source Humanitarian Outcomes <https://www.humanitarianoutcomes.org/projects/gdho>

"gdho"
12 changes: 8 additions & 4 deletions R/gdho_full.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
#' This dataset collected by Humanitarian Outcomes provides insights about humanitarian
#' organizations, such as name, website, headquarter information, and etc. This full version
#' includes the information about how many operational units a humanitarian organization has
#' in each country, which is represented as one column per country resulting in 273 variables
#' compared with the short version "gdho" (33 variables).
#' in each country, which is represented as one column per country resulting in 275 variables
#' compared with the short version "gdho" (35 variables).
#'
#' @format A tibble with 4556 rows and 273 variables
#' @format A tibble with 4548 rows and 275 variables
#'
#' \describe{
#' \item{id}{A unique Id for each organisation}
Expand Down Expand Up @@ -39,10 +39,14 @@
#' \item{ope/staff}{Percent of operational program expenditure per staff member}
#' \item{ope_inflation_adjusted}{Operational program expenditure adjusted for inflation}
#' \item{ope_original_currency}{Actual approximate operational program expenditure in original currency used by organisation}
#' \item{ope_original_amount}{Operational program expenditure amount in original currency}
#' \item{ope_original_currency_code}{Currency code for operational program expenditure}
#' \item{humexp_approx_usd}{Approximate humanitarian expenditure in USD}
#' \item{humexp_imputed}{Imputed approximate humanitarian expenditure in USD}
#' \item{humexp_inflation_adjusted}{Approximate humanitarian expenditure adjusted for inflation}
#' \item{countries}{Individual countries where organisations are operational}
#' \item{afghanistan..zimbabwe}{240 columns representing operational presence in individual countries}
#' }
#'
#' @source Humanitarian Outcomes <https://www.humanitarianoutcomes.org/projects/gdho>

"gdho_full"
79 changes: 64 additions & 15 deletions data-raw/data-processing.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,16 @@ gdho_raw <- readr::read_csv("./data-raw/gdho.csv", skip = 1)
gdho_full <- gdho_raw |>
dplyr::rename_all(~stringr::str_replace_all(.x, "[\\(\\)]", "")) |>
dplyr::rename_all(~stringr::str_replace_all(.x, " ", "_")) |>
dplyr::rename_all(tolower) #TODO: country names need more cleaning, remove ","
dplyr::rename_all(~stringr::str_replace_all(.x, ",", "")) |>
dplyr::rename_all(tolower)

## Encoding UTF-8 --------------------------------------------------------------
gdho_full <- gdho_full |>
mutate(across(where(is.character), stringi::stri_enc_toutf8, ))
mutate(across(where(is.character), \(x) stringi::stri_enc_toutf8(x)))

## Remove duplicate rows -------------------------------------------------------
gdho_full <- gdho_full |>
dplyr::distinct()

## Modify data types -----------------------------------------------------------
### to integer:
Expand Down Expand Up @@ -47,25 +52,69 @@ gdho_full <- gdho_full |>
"ope_inflation_adjusted", "humexp_approx_usd",
"humexp_inflation_adjusted"), as.double))

### TODO: separate ope_original_currency into 2 columns
### TODO:staff_, natl_, intl_imputed do not indicate numbers but some categories which are not documented
### Separate ope_original_currency into amount and currency columns
gdho_full <- gdho_full |>
tidyr::separate(ope_original_currency,
into = c("ope_original_amount", "ope_original_currency_code"),
sep = " ",
fill = "right",
remove = FALSE)

### Document imputed categories:
# staff_imputed, natl_imputed, intl_imputed categories:
# - "small" = estimated 1-50 staff
# - "medium" = estimated 51-250 staff
# - "large" = estimated 251-1000 staff
# - "very large" = estimated 1000+ staff


## Build dataset gdho ----------------------------------------------------------
gdho <- gdho_full[1:33] # a shorter version that does not include all country columns
# Get column names before country columns start
non_country_cols <- names(gdho_full)[1:which(names(gdho_full) == "afghanistan") - 1]
gdho <- gdho_full |>
dplyr::select(all_of(non_country_cols))

## Read and write dictionary ---------------------------------------------------
original_dict <- read_excel("./data-raw/gdho_read_me.xlsx", skip = 2)
gdho_full_dictionary <- tibble(directory = "data",
file_name = "gdho_full.rda",
variable_name = c(colnames(gdho_full)[1:33], "countries"),
variable_type = c(sapply(gdho_full, typeof)[1:33], "integer"),
description = original_dict$`Content description`)
gdho_dict <- tibble(directory = "data",
file_name = "gdho.rda",
variable_name = colnames(gdho_full)[1:33],
variable_type = sapply(gdho_full, typeof)[1:33],
description = original_dict$`Content description`[1:33])

# Get descriptions for new columns
new_column_descriptions <- c(
ope_original_amount = "Operational program expenditure amount in original currency",
ope_original_currency_code = "Currency code for operational program expenditure"
)

# Build dictionary for gdho_full
gdho_full_vars <- names(gdho_full)
country_start <- which(gdho_full_vars == "afghanistan")
country_end <- which(gdho_full_vars == "zimbabwe")

# Create descriptions vector
descriptions_full <- c(
original_dict$`Content description`[1:30], # Original descriptions up to ope_original_currency
new_column_descriptions["ope_original_amount"],
new_column_descriptions["ope_original_currency_code"],
original_dict$`Content description`[31:33], # Remaining original descriptions
rep("Country operational presence indicator", length(country_start:country_end))
)

gdho_full_dictionary <- tibble(
directory = "data",
file_name = "gdho_full.rda",
variable_name = gdho_full_vars,
variable_type = sapply(gdho_full, typeof),
description = descriptions_full
)

# Build dictionary for gdho
gdho_vars <- names(gdho)
gdho_dict <- tibble(
directory = "data",
file_name = "gdho.rda",
variable_name = gdho_vars,
variable_type = sapply(gdho, typeof),
description = descriptions_full[1:length(gdho_vars)]
)

dictionary <- rbind(gdho_full_dictionary, gdho_dict)
write_csv(dictionary, "./data-raw/dictionary.csv")

Expand Down
Loading