A practitioner's guide to data science on federal government platforms — and the first reference resource for a regulated domain designed to work natively with AI coding agents.
Read online · Agent Integration · Chapters · Platform Guides · Agent Commands · Docker Environment
We spend billions on federal data platforms. We hire analysts with degrees and certifications. And then we hand them a login and tell them to figure it out.
Commercial tutorials assume you have pip install, unrestricted internet access, and a cloud account you control. None of that is true in a DoD environment where CAC authentication, Impact Level restrictions, ATO processes, and air-gapped networks shape every technical decision.
The average federal data analyst spends their first six months learning the platform, not doing the mission work. That's a capability gap and an absurd waste of money.
This handbook closes it. Thirteen chapters covering the full data science lifecycle — environment setup through generative AI — grounded in how the work actually gets done on the five platforms where federal data science happens. Every code example is written to run in the constrained environments it describes, not on a local machine with unconstrained internet.
Clone this repo. Open it in Claude Code, Cursor, OpenCode, or Cline. Your AI agent now understands clearances, platform constraints, and compliance requirements — without you explaining them.
We believe this is the first reference resource for a regulated industry domain that ships native, multi-platform AI agent configuration — slash commands, workflow definitions, and structured context files — alongside the knowledge itself. If we're wrong, we want to know.
| Capability | What It Does |
|---|---|
/compliance-check |
Reviews your code against NIST 800-53, DoD AI Ethics, FedRAMP, and IL requirements. Outputs a structured severity table with remediation pointing to specific handbook sections. |
/generate-federal-code |
Generates platform-appropriate Python with correct headers, security patterns, and IL-level constraints. Knows that Databricks doesn't allow pip install in cells, that Foundry uses palantir_models, and that IL4+ means no external API calls. |
/teach |
Interactive tutor mode. Opens with the chapter's narrative hook, walks through concepts with code, tracks which learning objectives you've covered. |
| AI Coding Agent | Config File | Auto-loaded? |
|---|---|---|
| Claude Code | CLAUDE.md + .claude/commands/ |
Yes — on session start |
| Cursor | .cursorrules |
Yes — on project open |
| OpenCode | .opencode/config.yaml + .opencode/commands/ |
Yes — on project open |
| Cline | .clinerules/ + .clinerules/workflows/ |
Yes — on session start |
| Any agent | AGENTS.md |
Read on first query |
How it works: The agent interface layer encodes 96,000 words of non-inferable federal domain knowledge into structured context files — what Impact Level means, why IL4+ prohibits external API calls, what CAC/PIV authentication requires, which packages are available on each platform. Your agent doesn't need to hallucinate or ask you to explain your environment. It reads the handbook.
New to federal data science? Read chapters 1 through 4 in order. They cover the environment, access model, where data lives, and how to work with it.
Using an AI coding agent? Clone this repo, open it in your agent's IDE, and use the pre-built commands above. The agent picks up context automatically.
Switching platforms? Go directly to the platform guide for your new environment. Each is self-contained.
Need a specific capability? Jump to the relevant chapter (ML, MLOps, visualization, deployment, ethics). Each includes platform-specific implementation notes.
Building an AI/LLM application? Chapter 13 and the Palantir AIP guide cover the current landscape. Read chapter 12 (Ethics and Governance) in parallel.
- Junior analysts onboarding to any of the five platforms — skip the 18-month learning curve
- Team leads building data science practices inside DoD programs
- GovCon firms winning data task orders and needing to stand up teams fast
- AI coding agent users who want their agent to understand federal constraints without re-explaining them every session
- Anyone who's ever said "I can't find good training for [federal platform]"
| # | Title | What You'll Learn |
|---|---|---|
| 01 | Introduction to Data Science in Government | Clearances, CAC auth, Impact Levels, ATO — everything that shapes the work before you write code |
| 02 | Python and R Foundations | Air-gapped pip mirrors, conda on IL4/IL5, and the reality of getting a working environment on each platform |
| 03 | Data Acquisition | Where federal data lives — USASpending, SAM.gov, data.gov — and how to pull it programmatically |
| 04 | Data Wrangling and Cleaning | 47 million rows of procurement data that a program office called "analysis-ready" — pandas, Spark, and Delta Lake at scale |
| 05 | Exploratory Data Analysis | EDA without a data dictionary, on a platform that may not support interactive notebooks |
| 06 | Supervised Machine Learning | Building classifiers on DoD data: feature engineering on MILSTRIP, XGBoost on Databricks, and what accuracy means in a briefing |
| 07 | Unsupervised Machine Learning | Anomaly detection on GFEBS transactions, clustering readiness data, and turning unsupervised results into actionable findings |
| 08 | Deep Learning and Neural Networks | Object detection on drone video at 30fps with a 400ms inference budget — deep learning in constrained federal environments |
| 09 | MLOps and Production Pipelines | MLflow, model registries, drift detection, and the ATO implications of updating a production model |
| 10 | Visualization and Dashboards | Qlik, Advana dashboards, Databricks SQL — design principles that separate briefing-ready visuals from data art |
| 11 | Deployment and Scaling | Containers, artifact registries, API gateways, and an ATO process that treats every deployment as a risk event |
| 12 | Ethics, Governance, and Compliance | DoD AI Ethics Principles, NIST AI RMF, bias auditing, and what responsible AI governance looks like on an active program |
| 13 | Advanced Topics — GenAI, RAG, and LLMs | RAG at IL4/IL5, Palantir AIP Logic, fine-tuning on classified data, and the gap between commercial LLMs and federal deployments |
Every chapter includes working Python code examples and hands-on exercises with solutions.
| Platform | IL Levels | What It Covers |
|---|---|---|
| Advana | IL4, IL5 | DoD enterprise analytics — JupyterHub, Qlik, 100+ data sources, 100K+ users |
| Databricks | IL2–IL5 | Unity Catalog, Delta Lake, MLflow on AWS GovCloud and Azure Government |
| Navy Jupiter | IL4, IL5 | Department of the Navy — bronze/silver/gold data tiers, Navy-specific constraints |
| Palantir AIP / Foundry | IL4–IL6 | Ontology-based analytics, Pipeline Builder, AIP Logic for LLM workflows |
| Qlik | IL2, IL4 | Associative engine for federal BI — NIPRNet, Advana-hosted, and GovCloud |
Each guide is self-contained: access, setup, development environment, code patterns, and deployment.
This handbook ships with a Docker Compose stack that mirrors federal platform constraints locally:
| Service | Port | Purpose |
|---|---|---|
| Jupyter | 8888 | Development notebooks (+ Streamlit, Dash) |
| MLflow | 5000 | Experiment tracking and model registry |
| PostgreSQL | 5432 | Relational database |
| Redis | 6379 | Caching and session store |
| Nginx | 80/443 | Reverse proxy with TLS |
| Prometheus | 9090 | Metrics collection |
| Grafana | 3000 | Monitoring dashboards |
| Vault | 8200 | Secret management |
| CAC-auth | 8001 | CAC/PIV authentication simulator |
cp .env.example .env && docker compose up -dSee docs/LOCAL_ENVIRONMENT.md for full setup instructions.
The security-compliance/ directory contains reference implementations for federal security patterns — not toy examples, but working code for:
- CAC/PIV authentication with PKCS#11 smart card integration and OAuth bridging
- RBAC/ABAC with MAC enforcement (Bell-LaPadula), role hierarchies, and database-backed permission resolution
- FIPS 140-2 encryption with AES-256 at rest, TLS 1.3 in transit, and HSM key management
- NIST 800-53 compliance with automated control assessment, evidence collection, and reporting
- Audit logging with immutable trails and 7-year retention policies
See security-compliance/CLAUDE.md for a module-by-module guide.
Contributions that improve accuracy, add platform-specific detail, or extend coverage are welcome. See CONTRIBUTING.md for guidelines.
- Code (
.py,.sh,.yml,.yaml,Dockerfile*): MIT License - Written content (
chapters/,platform-guides/,docs/): Creative Commons Attribution 4.0 International (CC BY 4.0)
Content is based on publicly available information. Nothing in this repository is classified or export-controlled. Platform-specific details reflect publicly documented capabilities as of early 2026.
Read online · Agent Integration Guide · Star this repo · Report an issue