superzarathu

Generate custom commands for data analysis workflows with AI assistants (Claude Code & Gemini CLI)

Overview

superzarathu is an R package that provides intelligent templates and functions to generate custom commands for data analysis workflows. It supports both Claude Code and Gemini CLI, offering predefined workflows for data preprocessing, labeling, statistical analysis, visualization, and Shiny applications.

Key Features

🤖 AI-Driven Workflows: Templates optimized for AI assistants to understand and execute
📊 Data Processing: Advanced preprocessing with clinical trial data support
🩺 Data Doctor: Comprehensive data health check and diagnostics
📋 Excel Quality Check: Comprehensive Excel data quality assessment with AI-friendly output
🏷️ Smart Labeling: Automatic variable labeling with jstable integration
📈 Statistical Analysis: Templates for Korean medical statistics packages (jstable, jskm, jsmodule)
🎨 Visualization: Plot generation with ggplot2 and interactive graphics
⚡ Shiny Apps: Rapid Shiny application development templates

Installation

You can install the development version from GitHub:

# Using devtools
install.packages("devtools")
devtools::install_github("zarathucorp/superzarathu")

# Using remotes (lighter alternative)
install.packages("remotes")
remotes::install_github("zarathucorp/superzarathu")

# Using pak (modern approach)
install.packages("pak")
pak::pak("zarathucorp/superzarathu")

Quick Start

Basic Setup

library(superzarathu)

# Setup for Claude Code
sz_setup("claude")

# Setup for Gemini CLI
sz_setup("gemini")

Command Structure

After setup, use natural language commands:

# Data preprocessing
"preprocess the data"
"handle clinical trial data with repeated measures"

# Data health check
"diagnose my data"
"check data health"
"find data problems"

# Data labeling
"label the data"
"apply jstable labeling"

# Statistical analysis
"create descriptive statistics table"
"perform survival analysis"

# Visualization
"create a forest plot"
"make an interactive plot"

# Shiny app
"create a shiny dashboard"

Available Commands

Data Processing

sz:preprocess - Data cleaning and transformation
sz:doctor - Data health check and diagnostics
sz:label - Variable labeling and metadata management
excel_health_check() - Comprehensive Excel data quality assessment

Statistical Analysis

sz:table - Descriptive and analytical tables with jstable

Visualization

sz:plot - Static and interactive plots with ggplot2 and jskm

Shiny Development

sz:rshiny - Shiny application templates with jsmodule

Excel Data Quality Assessment

The excel_health_check() function provides comprehensive quality assessment for Excel files:

library(superzarathu)

# Check all Excel files in current directory
result <- excel_health_check()

# Check specific files
result <- excel_health_check(files = c("data1.xlsx", "data2.xlsx"))

# Generate only JSON output
result <- excel_health_check(output_format = "json")

Features

19 Quality Check Types: Structural problems, representation inconsistencies, value errors, missing data, hidden issues
AI-Friendly Output: JSON results with schema for better AI understanding
Detailed Reports: Markdown reports with actionable recommendations
Data Preservation: Read-only approach - never modifies original files
R Standards: Converts empty strings to NA following R conventions

Output Files

sz_excel_results_YYYYMMDD_HHMMSS.json - Detailed results
sz_excel_schema.json - JSON schema for AI interoperability
sz_excel_report_YYYYMMDD_HHMMSS.md - Human-readable report

Template Features

Advanced Data Preprocessing

📁 Automatic file detection in data/raw/
🔄 Clinical trial repeated measures handling (V1, V2, V3)
📅 Intelligent date conversion and age calculation
🧹 NA handling with multiple strategies
📌 pins package integration for S3/local storage

Data Health Check (Doctor)

🎯 Data quality scoring (A+ to F grade)
🔍 Automatic pattern detection (repeated measures, clinical trials, surveys)
⚠️ Issue identification per column
❓ Intelligent question generation for data producers
📄 Markdown report generation with detailed diagnostics

Smart Labeling System

🏷️ jstable::mk.lev() integration
🔢 Automatic 0/1 to No/Yes conversion
📊 Factor/continuous variable classification
📖 Codebook detection and application
🌐 Multi-language label support

AI Workflow Approach

Templates use a 2-stage approach:

Exploration Stage (Direct execution)
```
Rscript -e "str(data, list.len=5)"
```

Processing Stage (Script generation)

# Generated script for reproducibility
source("scripts/preprocess_data.R")

Project Structure

The package creates an organized project structure:

project/
├── data/
│   ├── raw/        # Original data files
│   └── processed/  # Cleaned data (RDS)
├── scripts/
│   ├── utils/      # Helper functions
│   ├── analysis/   # Analysis scripts
│   └── plots/      # Visualization scripts
├── output/
│   ├── tables/     # Generated tables
│   └── plots/      # Generated plots
└── app.R           # Shiny application

Requirements

Core Dependencies

R (≥ 3.5.0)
data.table
openxlsx
jsonlite
stringdist

Recommended Packages

jstable (for medical statistics)
jskm (for survival curves)
jsmodule (for Shiny modules)
pins (for data versioning)
ggplot2 (for visualization)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Authors

Zarathu Corp - office@zarathu.com
Jaewoong Heo - jwheo@zarathu.com

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built for seamless integration with Claude Code and Gemini CLI
Optimized for medical and clinical research workflows
Templates based on real-world data analysis patterns

Support

For issues and questions:

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
R		R
inst		inst
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.ko.md		README.ko.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

superzarathu

Overview

Key Features

Installation

Quick Start

Basic Setup

Command Structure

Available Commands

Data Processing

Statistical Analysis

Visualization

Shiny Development

Excel Data Quality Assessment

Features

Output Files

Template Features

Advanced Data Preprocessing

Data Health Check (Doctor)

Smart Labeling System

AI Workflow Approach

Project Structure

Requirements

Core Dependencies

Recommended Packages

Contributing

Authors

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Languages

License

zarathucorp/superzarathu

Folders and files

Latest commit

History

Repository files navigation

superzarathu

Overview

Key Features

Installation

Quick Start

Basic Setup

Command Structure

Available Commands

Data Processing

Statistical Analysis

Visualization

Shiny Development

Excel Data Quality Assessment

Features

Output Files

Template Features

Advanced Data Preprocessing

Data Health Check (Doctor)

Smart Labeling System

AI Workflow Approach

Project Structure

Requirements

Core Dependencies

Recommended Packages

Contributing

Authors

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages