Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -476,3 +476,12 @@ $RECYCLE.BIN/
# Windows shortcuts
*.lnk
*.db

.env
docs

# Local development overrides
docker-compose.override.yml

# Shadow mode metrics (generated at runtime)
shadow-mode-metrics.json
92 changes: 92 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Build Commands

### Docker (Full Stack)
```bash
# Build both containers
docker-compose build

# Run the full stack
docker-compose up
```

### Orchestrator (.NET 8)
```bash
# Build
dotnet build orchestrator/ModelScanner.sln

# Run (development mode with in-memory job storage)
dotnet run --project orchestrator/ModelScanner

# Run tests
dotnet test orchestrator/ModelScanner.sln

# Run a specific test
dotnet test orchestrator/ModelScanner.sln --filter "FullyQualifiedName~HashTaskTests"
```

### Model Scanner Container
```bash
docker build -t civitai-model-scanner ./model-scanner/
docker run -it --rm civitai-model-scanner 'https://example.com/model.bin'
```

## Architecture

This is a distributed AI model scanning system with two main components:

### 1. Model Scanner Container (Python)
Located in `model-scanner/`. A Docker container running:
- **picklescan**: Detects dangerous pickle imports in PyTorch models
- **clamscan**: ClamAV malware detection
- Python ML libraries (PyTorch CPU, safetensors) for model processing

### 2. Orchestrator Service (.NET 8)
Located in `orchestrator/ModelScanner/`. An ASP.NET Core web API that:
- Receives scan requests via HTTP endpoints
- Queues jobs using Hangfire (SQLite persistent or in-memory storage)
- Executes the scanner container via Docker API
- Calculates file hashes (SHA256, Blake3, CRC32, AutoV1/V2/V3)
- Converts models between formats (CKPT ↔ SafeTensors)
- Uploads processed files to S3/R2 cloud storage
- Reports results via webhook callbacks

### Processing Pipeline
Jobs are enqueued via `POST /enqueue` with configurable task flags:
- `Import` (1): Upload to cloud storage
- `Convert` (2): Format conversion
- `Scan` (4): Malware/pickle scanning
- `Hash` (8): Calculate cryptographic hashes
- `ParseMetadata` (16): Extract safetensors metadata
- `Default`: Import | Hash | Scan | ParseMetadata
- `All`: All tasks including Convert

Key flow: `FileProcessor.cs` downloads the model, runs requested tasks via `IJobTask` implementations, and POSTs results to the callback URL.

### Job Queues (Priority Order)
- `default`: Normal priority
- `low-prio`: Lower priority processing
- `x-low-prio`: Lowest priority (conversions)
- `cleanup`: Storage cleanup
- `delete-objects`: Deletion jobs

## Key Files

- `orchestrator/ModelScanner/Program.cs`: API endpoints and DI setup
- `orchestrator/ModelScanner/FileProcessor.cs`: Main job processing logic
- `orchestrator/ModelScanner/Tasks/`: Individual task implementations (HashTask, ScanTask, ImportTask, ConvertTask, ParseMetadataTask)
- `orchestrator/ModelScanner/CloudStorageService.cs`: S3/R2 integration
- `orchestrator/ModelScanner/DockerService.cs`: Scanner container execution
- `model-scanner/scripts/`: Python conversion scripts (ckpt_to_safetensors.py, safetensors_to_ckpt.py)

## Configuration

Settings in `appsettings.json`:
- `ValidTokens`: API authentication tokens
- `CloudStorageOptions`: S3/R2 credentials and bucket names
- `LocalStorageOptions`: Temp folder path
- `ConnectionStrings:JobStorage`: SQLite path for Hangfire (omit for in-memory)
- `Concurrency`: Worker thread count (defaults to CPU count)
252 changes: 252 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# Civitai Model Scanner

A distributed AI model scanning system that detects malware and malicious code in machine learning model files.

## Architecture

The system consists of several components:

```
+------------------+
| Cloud Storage |
| (S3/R2) |
+--------^---------+
|
+----------------+ +-------------------+ | +------------------+
| HTTP POST | | Orchestrator |-+-| Callback URL |
| /enqueue +---->| (.NET 8) | | (Webhook) |
+----------------+ | | +------------------+
| - Hangfire |
| - Job Queue |
+--------+----------+
|
+-----------------+------------------+
| | |
+---------v------+ +-------v--------+ +-----v------+
| Legacy Scanner | | Unified Scanner| | ClamAV |
| (picklescan) | | (TensorTrap) | | Updater |
+----------------+ +----------------+ +------------+
```

### Components

1. **Orchestrator Service** (.NET 8 / ASP.NET Core)
- Receives scan requests via HTTP API
- Queues jobs using Hangfire (SQLite or in-memory)
- Downloads model files, executes scanners via Docker
- Reports results via webhook callbacks

2. **Legacy Scanner** (Python/Docker)
- Picklescan: Detects dangerous pickle imports in PyTorch models
- ClamAV: Malware signature scanning

3. **Unified Scanner** (Python/Docker)
- TensorTrap: ML security scanner supporting 13+ formats
- ClamAV: Integrated malware scanning
- Detects 11+ CVEs and security vulnerabilities

4. **ClamAV Updater** (Sidecar)
- Automatically updates virus definitions every 2 hours
- Shares definitions with scanner containers via Docker volume

## Quick Start

### Prerequisites

- Docker Desktop
- .NET 8 SDK (for local development)

### Running with Docker Compose

```bash
# Build all images
docker-compose build

# Start the stack
docker-compose up -d

# Check status
docker-compose ps
```

### API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/enqueue` | POST | Submit a scan job |
| `/cleanup` | POST | Trigger temp storage cleanup |
| `/delete` | POST | Delete an object from storage |
| `/metrics/shadow` | GET | Get shadow mode metrics summary |
| `/metrics/shadow/full` | GET | Get full shadow mode metrics |
| `/metrics/shadow/reset` | POST | Reset shadow mode metrics |

### Submitting a Scan Job

```bash
curl -X POST "http://localhost/enqueue?token=YOUR_TOKEN&fileUrl=https://example.com/model.safetensors&callbackUrl=https://your-callback.com/result"
```

**Parameters:**
- `fileUrl` (required): URL of the model file to scan
- `callbackUrl` (required): Webhook URL for scan results
- `tasks` (optional): Bitmask of tasks to run (default: 28)
- Import = 1
- Convert = 2
- Scan = 4
- Hash = 8
- ParseMetadata = 16
- `lowPrio` / `extraLowPrio` (optional): Queue priority flags

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `ValidTokens__0`, `__1`, etc. | API authentication tokens | - |
| `ScannerOptions__UseUnifiedScanner` | Use TensorTrap instead of picklescan | `false` |
| `ScannerOptions__ShadowMode` | Run both scanners for comparison | `false` |
| `CloudStorageOptions__*` | S3/R2 credentials | - |
| `ConnectionStrings__JobStorage` | SQLite path for Hangfire | (in-memory) |
| `Concurrency` | Worker thread count | CPU count |

### Scanner Modes

1. **Legacy Mode** (default): Uses picklescan + ClamAV
2. **Unified Mode**: Uses TensorTrap + ClamAV
3. **Shadow Mode**: Runs both scanners, compares results, uses legacy for response

Shadow mode is useful for validating the new scanner before full migration.

## Development

### Building the Orchestrator

```bash
cd orchestrator
dotnet build ModelScanner.sln
dotnet run --project ModelScanner
```

### Running Tests

```bash
dotnet test orchestrator/ModelScanner.sln
```

### End-to-End Testing

The `e2e/` directory contains a test script that runs a full scan workflow:

```bash
# Start the stack with test configuration
docker-compose -f docker-compose.yml -f docker-compose.test.yml up -d

# Run the e2e test
cd e2e
python e2e_test.py /path/to/model.ckpt --timeout 300

# Example with options
python e2e_test.py ./model.safetensors \
--orchestrator-url http://localhost:80 \
--token test-token \
--tasks 28 \
--timeout 300 \
--json
```

**E2E Test Options:**
- `--orchestrator-url`: Orchestrator API URL (default: http://localhost:8080)
- `--token`: API token (default: test-token)
- `--tasks`: Task flags bitmask (default: 28 = Scan|Hash|ParseMetadata)
- `--timeout`: Timeout in seconds (default: 300)
- `--json`: Output raw JSON results

## Callback Response Format

```json
{
"url": "https://example.com/model.safetensors",
"fileExists": 1,
"picklescanExitCode": 0,
"picklescanOutput": "...",
"picklescanGlobalImports": ["torch", "collections"],
"picklescanDangerousImports": [],
"tensorTrapScanned": true,
"tensorTrapMaxSeverity": "info",
"tensorTrapIsSafe": true,
"tensorTrapFindings": [...],
"clamscanExitCode": 0,
"clamscanOutput": "OK",
"hashes": {
"SHA256": "...",
"Blake3": "...",
"CRC32": "...",
"AutoV1": "...",
"AutoV2": "...",
"AutoV3": "..."
}
}
```

## Shadow Mode Metrics

When running in shadow mode, metrics are collected comparing legacy and unified scanner results:

```bash
curl "http://localhost/metrics/shadow?token=YOUR_TOKEN"
```

```json
{
"totalScans": 1000,
"matches": 985,
"discrepancies": 15,
"agreementRate": 98.5,
"unifiedFoundMoreThreats": 12,
"legacyFoundMoreThreats": 3,
"bothSafe": 970,
"bothDangerous": 15,
"errors": 0,
"recommendation": "Unified scanner is finding MORE threats - safe to migrate"
}
```

## Supported File Formats

### TensorTrap (Unified Scanner)
- PyTorch: `.pt`, `.pth`, `.bin`, `.ckpt`
- Pickle: `.pkl`, `.pickle`
- NumPy: `.npy`, `.npz`
- Safetensors: `.safetensors`
- ONNX: `.onnx`
- GGUF: `.gguf`

### Legacy Scanner (Picklescan)
- PyTorch pickle files
- Does NOT scan `.safetensors` (considered safe by design)

## Security Considerations

- API endpoints require authentication via `token` query parameter
- Scanner containers run with memory limits (2GB default)
- ClamAV definitions are updated automatically
- Model files are deleted after scanning

## Troubleshooting

### Scanner Timeout
Large model files (>1GB) may take several minutes to scan. Adjust timeouts as needed.

### ClamAV Definitions Not Found
Ensure the clamav-updater container is running and has completed initial download:
```bash
docker logs model-scanner-clamav-updater-1
```

### Connection Refused on Callback
Ensure your callback URL is accessible from the Docker network. Use `host.docker.internal` for local development.

## License

[Your license here]
19 changes: 19 additions & 0 deletions clamav-updater/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM debian:bookworm-slim

RUN apt-get update && \
apt-get install -y --no-install-recommends clamav clamav-freshclam ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Create directory for definitions
RUN mkdir -p /var/lib/clamav && \
chown clamav:clamav /var/lib/clamav

COPY freshclam.conf /etc/clamav/freshclam.conf
COPY update-loop.sh /usr/local/bin/update-loop.sh
RUN chmod +x /usr/local/bin/update-loop.sh

# Run as clamav user for security
USER clamav

ENTRYPOINT ["/usr/local/bin/update-loop.sh"]
Loading