Generate custom commands for data analysis workflows with AI assistants (Claude Code & Gemini CLI)
superzarathu is an R package that provides intelligent templates and functions to generate custom commands for data analysis workflows. It supports both Claude Code and Gemini CLI, offering predefined workflows for data preprocessing, labeling, statistical analysis, visualization, and Shiny applications.
- 🤖 AI-Driven Workflows: Templates optimized for AI assistants to understand and execute
- 📊 Data Processing: Advanced preprocessing with clinical trial data support
- 🩺 Data Doctor: Comprehensive data health check and diagnostics
- 📋 Excel Quality Check: Comprehensive Excel data quality assessment with AI-friendly output
- 🏷️ Smart Labeling: Automatic variable labeling with jstable integration
- 📈 Statistical Analysis: Templates for Korean medical statistics packages (jstable, jskm, jsmodule)
- 🎨 Visualization: Plot generation with ggplot2 and interactive graphics
- ⚡ Shiny Apps: Rapid Shiny application development templates
You can install the development version from GitHub:
# Using devtools
install.packages("devtools")
devtools::install_github("zarathucorp/superzarathu")
# Using remotes (lighter alternative)
install.packages("remotes")
remotes::install_github("zarathucorp/superzarathu")
# Using pak (modern approach)
install.packages("pak")
pak::pak("zarathucorp/superzarathu")library(superzarathu)
# Setup for Claude Code
sz_setup("claude")
# Setup for Gemini CLI
sz_setup("gemini")After setup, use natural language commands:
# Data preprocessing
"preprocess the data"
"handle clinical trial data with repeated measures"
# Data health check
"diagnose my data"
"check data health"
"find data problems"
# Data labeling
"label the data"
"apply jstable labeling"
# Statistical analysis
"create descriptive statistics table"
"perform survival analysis"
# Visualization
"create a forest plot"
"make an interactive plot"
# Shiny app
"create a shiny dashboard"sz:preprocess- Data cleaning and transformationsz:doctor- Data health check and diagnosticssz:label- Variable labeling and metadata managementexcel_health_check()- Comprehensive Excel data quality assessment
sz:table- Descriptive and analytical tables with jstable
sz:plot- Static and interactive plots with ggplot2 and jskm
sz:rshiny- Shiny application templates with jsmodule
The excel_health_check() function provides comprehensive quality assessment for Excel files:
library(superzarathu)
# Check all Excel files in current directory
result <- excel_health_check()
# Check specific files
result <- excel_health_check(files = c("data1.xlsx", "data2.xlsx"))
# Generate only JSON output
result <- excel_health_check(output_format = "json")- 19 Quality Check Types: Structural problems, representation inconsistencies, value errors, missing data, hidden issues
- AI-Friendly Output: JSON results with schema for better AI understanding
- Detailed Reports: Markdown reports with actionable recommendations
- Data Preservation: Read-only approach - never modifies original files
- R Standards: Converts empty strings to NA following R conventions
sz_excel_results_YYYYMMDD_HHMMSS.json- Detailed resultssz_excel_schema.json- JSON schema for AI interoperabilitysz_excel_report_YYYYMMDD_HHMMSS.md- Human-readable report
- 📁 Automatic file detection in
data/raw/ - 🔄 Clinical trial repeated measures handling (V1, V2, V3)
- 📅 Intelligent date conversion and age calculation
- 🧹 NA handling with multiple strategies
- 📌 pins package integration for S3/local storage
- 🎯 Data quality scoring (A+ to F grade)
- 🔍 Automatic pattern detection (repeated measures, clinical trials, surveys)
⚠️ Issue identification per column- ❓ Intelligent question generation for data producers
- 📄 Markdown report generation with detailed diagnostics
- 🏷️ jstable::mk.lev() integration
- 🔢 Automatic 0/1 to No/Yes conversion
- 📊 Factor/continuous variable classification
- 📖 Codebook detection and application
- 🌐 Multi-language label support
Templates use a 2-stage approach:
-
Exploration Stage (Direct execution)
Rscript -e "str(data, list.len=5)" -
Processing Stage (Script generation)
# Generated script for reproducibility source("scripts/preprocess_data.R")
The package creates an organized project structure:
project/
├── data/
│ ├── raw/ # Original data files
│ └── processed/ # Cleaned data (RDS)
├── scripts/
│ ├── utils/ # Helper functions
│ ├── analysis/ # Analysis scripts
│ └── plots/ # Visualization scripts
├── output/
│ ├── tables/ # Generated tables
│ └── plots/ # Generated plots
└── app.R # Shiny application
- R (≥ 3.5.0)
- data.table
- openxlsx
- jsonlite
- stringdist
- jstable (for medical statistics)
- jskm (for survival curves)
- jsmodule (for Shiny modules)
- pins (for data versioning)
- ggplot2 (for visualization)
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Zarathu Corp - office@zarathu.com
- Jaewoong Heo - jwheo@zarathu.com
This project is licensed under the MIT License - see the LICENSE file for details.
- Built for seamless integration with Claude Code and Gemini CLI
- Optimized for medical and clinical research workflows
- Templates based on real-world data analysis patterns
For issues and questions: