Skip to content

Conversation

@terchris
Copy link
Collaborator

No description provided.

terchris and others added 17 commits October 31, 2025 16:51
Implements comprehensive validation workflow improvements based on C#
implementation evaluation. Establishes "validation-first" as core principle
for all language implementations.

## Key Changes

### Documentation
- specification/09-development-loop.md: Add "Validation-First Development" section
  - Two-level validation strategy (TypeScript baseline + language-specific)
  - Clear timing guidance: validate AFTER implementing, not before
  - SDK-based connectivity testing approach

- specification/tools/README.md: Restructure validation sequence
  - Emphasize 8-step MANDATORY sequence
  - Clarify: validation checks OUTPUT of implementation
  - Document run-full-validation.sh as automated option
  - Make Step 8 (Grafana) clearly blocked until Steps 1-7 pass

### Templates (specification/llm-work-templates/)
- CLAUDE-template.md: LLM instructions with validation-first principle
- ROADMAP-template.md: 13-task implementation checklist template
- task-templates/: Detailed task files for all 13 tasks
  - task-06: Made OTLP connectivity validation MANDATORY
  - Subtasks 6.7, 6.8, 6.9 now blocking steps with backend verification

### Enforcement
- check-progress.sh: Validates ROADMAP.md progress before allowing validation
- init-language-workspace.sh: Initializes llm-work/ directory structure

### Skills Updates
- Updated all .claude/skills/ to reference new documentation
- Cross-references to validation workflow throughout

## Principles Established

1. **Two-Level Validation**:
   - TypeScript validates the system (infrastructure health)
   - Your language validates its integration (SDK connectivity)

2. **Validation Timing**:
   - Implement code FIRST
   - Run tests to generate output
   - THEN validate the output

3. **8-Step Sequence**:
   - Steps 1-7: Automated via run-full-validation.sh (or manual)
   - Step 8: Manual Grafana visual check (REQUIRED)

4. **Task Completion Rule**:
   - Cannot claim task complete without running validation
   - Validation must pass before proceeding

## Related

- Plan document: terchris/plans-current/csharp-evaluation2.md (not committed)
- Issue identified: C# implementation deferred validation to end
- Solution: Make validation continuous and mandatory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit implements 7 critical improvements to prevent issues encountered
in C# implementation sessions 3 and 4, where the LLM required 4-5 user
corrections per session.

## Improvements Made

### Phase 0 (Planning) - Task Templates
1. **task-01-check-otel-maturity.md**: Add mandatory version checking
   - New subtask 1.5: Check latest stable version on package repositories
   - Prevents using outdated SDKs (C# Session 4: used 1.13.1, needed 1.14.0-rc.1)
   - Includes decision criteria for stable vs RC versions
   - Links to package repos for 7 languages

2. **task-03-research-otel-sdk.md**: Add instrument creation patterns research
   - New subtask 3.6: Research instrument lifecycle in official examples
   - Prevents initialization order issues (C# requires instruments BEFORE Build())
   - Includes GitHub search patterns for official SDK examples

### Phase 1 (Implementation) - Task Templates
3. **task-06-implement-otlp.md**: Add TypeScript baseline verification
   - New subtask 6.1: Check TypeScript reference implementation first
   - Verifies infrastructure health before debugging language-specific code
   - Prevents wasting time debugging code when infrastructure is broken
   - Renumbered subsequent subtasks (6.2-6.11)

4. **task-08-implement-api.md**: Add mandatory validation requirement
   - New 80-line section: "MANDATORY VALIDATION BEFORE CLAIMING COMPLETE"
   - Includes evidence from C# Session 3 (5 corrections, 3+ hours debugging)
   - Provides exact validation commands for all 4 steps
   - Emphasizes "The 'It Compiles' Trap" - compilation ≠ validation

### Template & Guidance Documents
5. **CLAUDE-template.md**: Add latest version policy principle
   - New principle: "Always use latest stable versions"
   - Explains why: bug fixes accumulate, outdated = debugging fixed issues
   - References C# Session 4 as example
   - Adds enforcement note (Task 1 now requires version check)

6. **validation-sequence.md**: Add TypeScript baseline emphasis
   - New section: "ALWAYS Verify TypeScript Baseline First"
   - 50-line section with decision tree
   - Explains when TypeScript passes vs fails → infrastructure vs code issue
   - Prevents debugging wrong layer of the stack

### Skills
7. **implement-language/SKILL.md**: Add 6 critical process rules
   - New Step 2.5: Critical Process Rules (70 lines)
   - Rule 1: Always check latest stable version
   - Rule 2: Always verify TypeScript baseline before debugging
   - Rule 3: Never claim completion without validation
   - Rule 4: Research official SDK examples (instrument patterns)
   - Rule 5: Follow the development loop (6-step iterative workflow)
   - Rule 6: Consult TypeScript reference when unsure

## Impact

**Files modified**: 7
**Lines added**: ~265 lines of guidance
**Issues prevented**:
- Using outdated OpenTelemetry versions
- Wrong instrument initialization order
- Claiming completion without validation
- Debugging code when infrastructure is broken
- Missing critical SDK patterns

**Evidence base**: C# implementation sessions 3 & 4 evaluations
**Planning docs**: terchris/plans-current/csharp-evaluation3-plan.md (with ADDENDUM)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixes regression from commits 16e9c9b and d32fd41 where moving instructions
from inline (in skill) to reference-based (separate files) caused worse LLM
compliance. Analysis showed LLM skipped critical steps because they were
easy to ignore.

## Changes Made

### .claude/skills/implement-language/SKILL.md
- Changed Step 2 from "Read these files" to 4-step mandatory process
- Forces actual tool execution: "Execute this command NOW (use Bash tool)"
- Added Step 2.2: Explicitly update ROADMAP.md before any work
- Added Step 2.4: Checkpoint confirmations before proceeding
- Makes ROADMAP.md update non-optional with Edit tool requirement

### specification/llm-work-templates/CLAUDE-template.md
- Added visual box at top (lines 9-37): MANDATORY FIRST STEPS
- Added TypeScript reference box (lines 39-66) BEFORE other content
- Explicitly mentions .env file structure at line 59
- Creates "stop sign" effect with visual barriers
- Blocks reading further until Steps 1-3 complete

## Why This Fixes the Problem

**Previous approach (commits 16e9c9b, d32fd41):**
- Moved instructions OUT of automatic context (skill → separate files)
- Changed from inline (automatic) to reference (manual) delivery
- Added 6,500 lines but LLM followed instructions LESS
- Result: LLM skipped ROADMAP.md updates, missed .env file, used TodoWrite

**This fix:**
- Restores critical instructions IN context (skill file)
- Forces tool execution (not just "read this")
- Adds visual barriers (boxes) to catch attention
- Puts TypeScript .env reference at TOP (line 41, not line 510)
- Creates psychological checkpoints

## Evidence of Need

**C# Session 5 failures (with reference-based approach):**
- ❌ Did NOT read ROADMAP.md
- ❌ Did NOT update ROADMAP.md at start
- ❌ Used TodoWrite exclusively (ignored ROADMAP.md)
- ❌ Did NOT check TypeScript reference
- ❌ Missed .env file pattern (user had to point it out)
- User corrections needed: 1+

**Predicted improvements (with inline approach):**
- ✅ Read ROADMAP.md (Step 2.1 forces Bash tool)
- ✅ Update ROADMAP.md (Step 2.2 forces Edit tool)
- ✅ Check TypeScript (visual box at line 41)
- ✅ Find .env pattern (explicitly mentioned in box)
- Predicted user corrections: 0-1

## Related

- Analysis doc: Session 5 postmortem in conversation
- Problem identified: Indirection kills compliance
- Solution: Make critical steps HARDER to skip

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add install-ai-claudecode.sh for Claude Code AI setup
- Rename install-cline-ai.sh to install-ai-cline.sh for consistency
- Update all language installation scripts (C#, Go, Java, PHP, Python, Rust, TypeScript)
- Update kubectl, PowerShell, and data analytics installations
- Add new dev-setup utility
- Update dev-template.sh

These changes are unrelated to the task-management-system feature and
should be merged to main separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add install-ai-claudecode.sh for Claude Code AI setup
- Rename install-cline-ai.sh to install-ai-cline.sh for consistency
- Update all language installation scripts (C#, Go, Java, PHP, Python, Rust, TypeScript)
- Update kubectl, PowerShell, and data analytics installations
- Add new dev-setup utility
- Update dev-template.sh

These changes are unrelated to the task-management-system feature and
should be merged to main separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add auto-approval for ./run-full-validation.sh (validation script)
- Add auto-approval for WebFetch to www.nuget.org (C# package lookups)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This feature branch implements the architectural change from host-based
LLM execution (using in-devcontainer.sh wrapper) to in-container LLM
execution (Claude Code running directly inside DevContainer).

Changes in this commit:
- Make install-ai-claudecode.sh executable (chmod +x)

This is the initial commit. Subsequent commits will update:
- Documentation (architecture diagrams, command patterns)
- Skills (remove wrapper teaching)
- Templates (direct execution examples)
- Permissions system (direct command auto-approval)

See terchris/plans-current/environmentchange-plan.md for complete
migration plan (8 phases, 15-23 hours estimated).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit addresses user feedback to make task templates "shorter and more clear" by removing non-actionable information and clarifying task relationships.

## Major Changes

### Time Estimates Removed (102 lines)
- Removed time estimates from 9 task template files
- Kept estimates in ROADMAP.md for user planning
- Rationale: Time estimates don't help LLM execution, only add maintenance burden

### Task Template Improvements
- **Task 3/4 relationship**: Clarified that Task 3 creates initial research notes, Task 4 completes and structures the document
- **Task 7 subtasks**: Fixed numbering from 8.x to 7.x (was incorrectly numbered)
- **Task 2**: Added TypeScript reference structure section (Makefile, run-test.sh)
- **Task 6**: Fixed blocking point typos, added linting check, removed 09-development-loop.md references
- **Task 7**: Removed line count "(995 lines)", fixed subtask numbering
- **Task 9**: Removed redundant phrase and 09-development-loop.md reference
- **Task 12**: Simplified from 10 subtasks to 3 (556 → 227 lines, 59% reduction)
  - Now uses actual validation scripts (run-full-validation.sh)
  - Removed non-existent script references (check-otel-backend.sh)
  - Made more concise while preserving all functionality

### Anti-Pattern Updates (07-anti-patterns.md)
- Removed obsolete "host vs DevContainer" anti-pattern (wrapper-era content)
- Reframed kubectl anti-pattern: "Use Grafana Instead" (not "fix kubectl")
- Removed 23 lines of obsolete content

### Cross-Cutting Changes
- Removed line count references across all files
- Removed 09-development-loop.md references from task templates (too meta for execution)
- Changed "DevContainer environment" to "working directory and network endpoints" (more specific)

## Files Modified (24 files)
- 9 task templates updated (task-01 through task-12)
- 1 task template renamed (task-08 → task-07, fixed numbering)
- 1 anti-patterns doc updated
- 1 validation sequence doc updated
- Multiple supporting docs updated

## Total Line Reduction
- Task templates: ~600+ lines removed
- Time estimates: 102 lines
- task-12: 329 lines (59% reduction)
- Anti-patterns: 23 lines
- Other cleanup: ~146 lines

## Testing
- Verified all referenced scripts exist
- Verified all cross-references are valid
- Checked skills are still correct (no updates needed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Clean up task templates and remove documentation bloat

This merge brings major documentation improvements based on user feedback
to make task templates "shorter and more clear" by removing non-actionable
information and fixing structural issues.

Key improvements:
- Removed 102 lines of time estimates from 9 task templates
- Simplified task-12 by 59% (556 → 227 lines)
- Clarified Task 3/4 relationship
- Fixed task numbering and references
- Removed obsolete wrapper-era content

Total: 1,039 lines removed across 24 files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ed-through

## Summary
This PR consolidates validation tools to use combined flags, reducing queries
from 12 to 6 (50% improvement), renames validators for consistency, and fixes
a critical bug in TypeScript where ended spans were bleeding into subsequent
log entries.

## Changes

### 1. TypeScript Bug Fix: Span Bleed-Through (typescript/src/logger.ts)
**Problem:** Ended spans were still being attached to subsequent log entries
even after sovdev_end_span() was called, causing incorrect trace_id values.

**Solution:** Added WeakSet to track ended spans and prevent their reuse:
- Added `endedSpans` WeakSet to track explicitly ended spans
- Modified sovdev_log() to check if span has been ended before using it
- Prevents ended spans from bleeding into subsequent operations

**Impact:** Ensures log entries only get trace IDs from active spans, not
ended ones. Critical for correct trace correlation in distributed systems.

### 2. Combined Validation Flags (all query-*.sh scripts)
Added --validate and --compare-with flags to all 6 query scripts:
- query-loki.sh, query-prometheus.sh, query-tempo.sh (direct backends)
- query-grafana-loki.sh, query-grafana-prometheus.sh, query-grafana-tempo.sh

**Three validation modes:**
- Mode 1: Query only (basic check)
- Mode 2: Query + schema validation (--validate)
- Mode 3: Query + schema + consistency (--validate --compare-with FILE)

**Benefit:** Reduces from 2 queries per backend to 1 query per backend

### 3. Orchestration Script Updates
**run-full-validation.sh:**
- Removed ROADMAP.md progress check (no longer needed)
- Updated all validation steps to use combined flags
- Reduced from 12 queries to 6 queries (50% efficiency improvement)
- Updated header and summary documentation

**run-grafana-validation.sh:**
- Already updated to use combined flags (3 queries instead of 6)

### 4. File Renames for Consistency
**Validators:**
- validate-log-consistency.py → validate-loki-consistency.py
- validate-metrics-consistency.py → validate-prometheus-consistency.py
- validate-trace-consistency.py → validate-tempo-consistency.py

**Tools:**
- query-grafana.sh → validate-grafana-datasources.sh
  (clarifies it validates config, not queries data)

**Updated alias in in-devcontainer.sh:**
- grafana → validate-grafana-datasources.sh

### 5. Documentation Updates
**specification/tools/README.md:**
- Removed DevContainer Toolbox references
- Updated Prerequisites section
- Added three validation modes to Steps 2-7 (manual validation sequence)
- Simplified Step 8 (removed unnecessary checklist)
- Updated validation script comparison tables
- Updated composable workflows examples
- Documented combined validation approach

**specification/tests/README.md:**
- Updated all validator references to new names
- Added note about recommended vs manual validation approaches
- Updated "Complete Validation Workflow" to show combined approach first
- Updated tool integration table and examples
- Shows current run-full-validation.sh implementation

## Testing
- Tested all three validation modes for each query script
- Verified run-full-validation.sh works with combined flags
- Verified run-grafana-validation.sh works with combined flags
- All 6 backends validate correctly with reduced query count

## Breaking Changes
None - new flags are optional, existing usage patterns still work

## Migration Guide
**Recommended:** Update to combined validation approach:
```bash
# Old approach (2 queries):
./query-loki.sh SERVICE --validate
./query-loki.sh SERVICE --compare-with FILE

# New approach (1 query):
./query-loki.sh SERVICE --validate --compare-with FILE
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
feat: Improve validation tools efficiency and fix TypeScript span ble…
…ct structure

This change addresses confusion in the C# implementation where the .env file was skipped because it wasn't clearly marked as mandatory in the specification.

Changes to specification/06-test-scenarios.md:
- Added "(MUST exist)" to .env file in project structure diagram
- Changed "run-test.sh Script" to "run-test.sh Script (MUST EXIST)"
- Changed ".env Configuration" to ".env Configuration (MUST EXIST)"
- Added "REQUIRED file" emphasis to both sections
- Added Makefile to project structure as "(optional but recommended)"

Changes to specification/08-testprogram-company-lookup.md:
- Added critical files warning box highlighting run-test.sh and .env as REQUIRED
- Added "(MUST exist)" to both files in project structure diagram
- Added "**REQUIRED - MUST exist**" emphasis to implementation checklist
- Added Makefile to project structure as "(optional but recommended)"

Changes to .claude/settings.local.json:
- Added auto-approval rules for validation and development tools
- Includes query scripts, make targets, dotnet commands, and validation tools

Rationale:
- .env file contains OTLP endpoint configuration and is essential for tests
- run-test.sh is the standardized entry point used by all validation tools
- Without these files, validation tools fail with unclear errors
- Makefile provides consistent interface (documented in 10-code-quality.md)
- Makefile is optional but recommended for consistency across languages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Clarifies .env and run-test.sh as REQUIRED files and adds Makefile to project structure.

This merge includes:
- Enhanced documentation marking .env and run-test.sh as mandatory
- Added Makefile to project structure as optional but recommended
- Updated auto-approval rules for validation tools

Fixes issue where C# implementation skipped creating .env file due to unclear requirements.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…_otlp_connection

Add optional diagnostic functions to help validate OTLP configuration and test
connectivity during development and deployment.

Features:
- sovdev_validate_config(): Validates environment variables are set correctly
- sovdev_test_otlp_connection(): Tests connectivity to all 3 OTLP endpoints
  by sending properly formatted OTLP JSON payloads

Benefits:
- Early infrastructure validation (before implementing exporters)
- Distinguishes configuration errors from SDK bugs
- Reduces debugging time by ~75% (4+ hours to 1 hour)
- Helps troubleshoot 404, connection refused, timeout errors

Implementation:
- Uses native http/https module (allows Host header for Traefik routing)
- Sends valid OTLP JSON payloads for logs, metrics, traces
- Returns structured results with HTTP status codes
- Non-blocking (warns only, never exits process)

Related: Addresses recommendations from C# implementation failure evaluation
regarding early signal validation and infrastructure testing.

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update TypeScript README with comprehensive documentation for the new
diagnostic functions, including examples, use cases, and troubleshooting.

Changes:
- README.md: Added 'Optional Diagnostic Functions' section with detailed
  documentation for sovdev_validate_config() and sovdev_test_otlp_connection()
- company-lookup.ts: Added pre-flight checks demonstrating usage of
  diagnostic functions before initialization

Documentation includes:
- Function signatures and return types
- Practical usage examples
- When to use / when NOT to use guidance
- Common errors and troubleshooting (404, connection refused, timeouts)
- Explanation of why three separate OTLP endpoints exist

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ement

Add optional diagnostic functions to specification and enforce .env file
validation to prevent configuration issues discovered in C# implementation.

Specification changes (01-api-contract.md):
- Add 'Optional Diagnostic Functions' section documenting sovdev_validate_config()
  and sovdev_test_otlp_connection()
- Include OTLP payload examples for logs, metrics, traces
- Document when to use, troubleshooting, HTTP status code interpretation
- Update document version from v1.0.0 to v1.1.0
- Update overview to list 8 mandatory + 2 optional functions

Enforcement changes (check-progress.sh):
- Add .env file validation (required after Task 6 - OTLP exporters)
- Validate all required OTLP environment variables present
- Check service name includes language identifier
- Fixed: Support decimal progress values (e.g., "1.5/4")
- Fixed: Arithmetic error in count_phase_tasks (double-zero output bug)
- Prevents "missing .env" issue that cost 4+ hours in C# implementation

Template updates:
- ROADMAP-template.md: Add .env file as mandatory checkpoint in Task 5
- CLAUDE-template.md: Add prominent .env file checkpoint warning
- README.md: Document check-progress.sh enhancements
- task-05-setup-project.md: New file with detailed .env setup instructions

Benefits:
- Prevents entire class of configuration errors (.env missing)
- Early infrastructure validation before implementing exporters
- Reduces debugging time by ~75% (4+ hours to 1 hour)
- Clear distinction between infrastructure/config/SDK issues

Related: Addresses recommendations from C# implementation failure evaluation
regarding early signal validation and .env file enforcement.

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@terchris terchris self-assigned this Nov 13, 2025
@terchris terchris merged commit 5f5e7f2 into norwegianredcross:main Nov 13, 2025
1 of 5 checks passed
@terchris terchris deleted the feature/diagnostic-functions-and-env-validation branch November 27, 2025 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant