Kelleretoro · Copilot · Sep 28, 2025 · Sep 28, 2025
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -28,15 +28,73 @@ Always reference these instructions first and fallback to search or bash command
   - Output defaults to `<input_file>-decrypted.zip` if not specified
   - Replay mode requires `scriptreplay` command to be available in PATH
 
+## Data Flow and Processing
+
+### Encryption/Decryption Flow
+1. **Key Generation**: `hpke.rs` creates P-256 ECDH + AES-GCM key pairs
+2. **Metadata Parsing**: `metadata.rs` extracts session info from encrypted header  
+3. **Data Decryption**: `hpke.rs` decrypts session data using HPKE context
+4. **Data Processing**: `data.rs` parses decrypted packets into client/server streams
+5. **Output Generation**: Either ZIP archive or PTY replay format
+
+### File Format Details
+- **Input**: Cloudflare SSH audit log (encrypted binary format)
+- **Metadata**: Base64-encoded JSON header with session details
+- **Keys**: 44-character base64 strings (P-256 private/public keys)
+- **Output ZIP**: Contains `data_from_client.txt`, `data_from_server.txt` for non-PTY
+- **PTY Output**: Contains `term_data.txt`, `term_times.txt` for scriptreplay
+
+### PTY vs Non-PTY Sessions
+- **PTY Sessions**: Interactive terminal sessions with allocated pseudo-terminal
+  - Generate timing data for replay with `scriptreplay`
+  - Include terminal dimensions, modes, and escape sequences
+- **Non-PTY Sessions**: Non-interactive sessions (file transfers, command execution)
+  - Generate separate client/server data streams
+  - No timing information or terminal control sequences
+
 ## Validation
 
+### Basic Validation (Required After Any Changes)
 - ALWAYS run through at least one complete end-to-end scenario after making changes:
   - Generate a key pair: `cargo run -- generate-key-pair -o /tmp/test_key`
   - Verify both `/tmp/test_key` and `/tmp/test_key.pub` files are created
   - Test help commands: `cargo run -- --help` and `cargo run -- decrypt --help`
 - ALWAYS run `cargo fmt --check` and `cargo clippy` before committing changes or the CI (.github/workflows/lint.yaml) will fail
 - Build artifacts are located in `target/` directory and should not be committed
 
+### Comprehensive Testing Checklist
+```bash
+# 1. Code quality checks
+cargo fmt --check          # Format validation
+cargo clippy               # Lint validation  
+cargo check                # Syntax validation
+
+# 2. Build verification
+cargo build                # Debug build (~27s)
+cargo build --release      # Release build (~45s)
+
+# 3. Functionality testing
+cargo run -- --help                        # Main help
+cargo run -- generate-key-pair --help      # Subcommand help
+cargo run -- decrypt --help                # Subcommand help
+cargo run -- generate-key-pair -o test_key # Key generation
+ls -la test_key test_key.pub               # Verify key files
+wc -c test_key test_key.pub                # Verify 44 chars each
+rm test_key test_key.pub                   # Cleanup
+
+# 4. Advanced testing (with real data)
+# cargo run -- decrypt -i <real_session.log> -k test_key -o output.zip
+# unzip -l output.zip  # Verify ZIP contents
+```
+
+### Performance Expectations
+- `cargo check`: ~1 second (incremental), ~48 seconds (fresh)
+- `cargo build`: ~27 seconds (debug), ~45 seconds (release)
+- `cargo test`: ~7 seconds (no actual tests currently)
+- `cargo clippy`: ~3 seconds
+- Key generation: Near-instant
+- File operations: Dependent on input file size
+
 ## Repository Structure
 
 ```
@@ -69,6 +127,43 @@ ssh-log-cli/
 - `thiserror` v1.0.20 - Error handling macros
 - `zip` v0.6.2 - ZIP file creation and manipulation
 
+## Key Modules and Responsibilities
+
+### `src/main.rs`
+- CLI entry point using `clap` with derive macros
+- Command routing between `decrypt` and `generate-key-pair` subcommands
+- File I/O coordination and temporary directory management
+- ZIP file creation orchestration
+
+### `src/hpke.rs`
+- HPKE (Hybrid Public Key Encryption) key pair generation and management
+- Encryption context creation and key derivation
+- Base64 encoding/decoding of keys (always 44 characters)
+- Key format validation
+
+### `src/data.rs`
+- Data packet decoding from encrypted session streams
+- Binary data parsing and transformation
+- Session data extraction and formatting
+- File generation for client/server data streams
+
+### `src/metadata.rs`
+- Session metadata parsing from encrypted headers
+- JSON deserialization of session information
+- PTY metadata handling (terminal settings, dimensions)
+- Exit status and timestamp processing
+
+### `src/pty.rs`
+- PTY (Pseudo-Terminal) session replay generation
+- Terminal data formatting for `scriptreplay` compatibility
+- Timing data calculation and output
+- Session header and footer generation
+
+### `src/zip.rs`
+- ZIP archive creation utilities
+- File compression and directory traversal
+- Archive organization and metadata preservation
+
 ## Build Times and Timeouts
 
 **CRITICAL**: All build and test commands can take significant time. NEVER CANCEL these operations:
@@ -137,4 +232,81 @@ The repository uses GitHub Actions with multiple workflows:
 - **release.yml**: Automated binary releases with attestations for tagged versions
 - **semgrep.yml**: Security vulnerability scanning
 
-All workflows must pass for successful CI. Pay special attention to formatting requirements enforced by lint.yaml.
+All workflows must pass for successful CI. Pay special attention to formatting requirements enforced by lint.yaml.
+
+## Tool Calling and Efficiency
+
+**CRITICAL**: For maximum efficiency, whenever you need to perform multiple independent operations, ALWAYS invoke all relevant tools simultaneously rather than sequentially. Especially when:
+
+- Exploring repository structure, reading files, viewing directories
+- Validating changes across multiple files
+- Running multiple build/test commands that don't depend on each other
+- Checking file contents and directory listings
+
+Examples:
+```bash
+# ❌ Sequential - inefficient
+cargo check
+cargo clippy
+cargo fmt --check
+
+# ✅ Parallel - efficient when using multiple tool calls
+# Call cargo check, cargo clippy, and cargo fmt --check simultaneously
+```
+
+## Code Style and Preferences
+
+- **Minimal Changes**: Make absolutely minimal modifications - change as few lines as possible to achieve the goal
+- **Preserve Existing Patterns**: Follow existing code patterns and conventions in the repository
+- **Error Handling**: Use `thiserror` for custom errors and `Result<T, E>` for error propagation
+- **Logging**: Avoid adding print statements; use proper error messages in Result types
+- **Memory Safety**: Prefer owned types over references when crossing API boundaries
+- **Performance**: Use `base64` crate for encoding/decoding, avoid string concatenation in loops
+
+## Testing Strategy
+
+- **No Unit Tests Currently**: The repository has `cargo test` set up but no actual tests yet
+- **Manual Testing Required**: Always test CLI functionality manually after code changes
+- **End-to-End Validation**: Run complete workflows (key generation → encryption → decryption)
+- **Cross-Platform Considerations**: Remember that replay functionality only works on Linux/macOS
+- **Error Cases**: Test error handling with invalid inputs, missing files, wrong formats
+
+## Common Issues and Solutions
+
+### Build Issues
+- **Missing Dependencies**: Run `cargo check` first to identify missing dependencies
+- **Clippy Warnings**: Address clippy warnings before committing - CI will fail otherwise
+- **Format Issues**: Always run `cargo fmt --check` - CI enforces strict formatting
+
+### Runtime Issues
+- **File Path Issues**: Use absolute paths when working with temporary files
+- **Platform Differences**: Replay mode requires `scriptreplay` (not available on Windows)
+- **Key Format**: Keys must be exactly 44 base64-encoded characters
+- **ZIP Output**: Output files must have `.zip` extension for decrypt operations
+
+### Development Workflow
+- **Incremental Testing**: Test changes frequently with `cargo check` → `cargo build` → manual testing
+- **CI Validation**: All four workflows (tests.yml, lint.yaml, release.yml, semgrep.yml) must pass
+- **Binary Location**: Compiled binaries are in `target/debug/` or `target/release/`
+
+## Contributing Guidelines
+
+### Making Changes
+1. **Start Small**: Make minimal, focused changes that solve specific problems
+2. **Test Early and Often**: Run validation checklist after each significant change
+3. **Follow Patterns**: Maintain consistency with existing code style and error handling
+4. **Document Changes**: Update README.md if CLI interface or functionality changes
+
+### Pull Request Requirements
+- All CI workflows must pass (formatting, linting, cross-platform builds, security scan)
+- Manual testing validation using the comprehensive checklist above
+- No new clippy warnings (existing warnings are acceptable for now)
+- Proper error handling using `Result<T, E>` and `thiserror` patterns
+
+### Release Process
+- Releases are automated via GitHub Actions when tags are created
+- Binary artifacts are built for Linux, macOS, and Windows
+- Release binaries include SLSA attestations for supply chain security
+- Version bumping should follow semantic versioning (major.minor.patch)
+
+This tool is designed to be a secure, efficient, and reliable solution for processing Cloudflare SSH audit logs. Always prioritize security, performance, and maintainability when making changes.