feat: Onboarding Phase 2 — chat-first setup + forced password change by MiltonSilvaJr · Pull Request #733 · nextlevelbuilder/goclaw

MiltonSilvaJr · 2026-04-06T20:47:02Z

Summary

OnboardingStore PostgreSQL — 4 methods (UpdateTenantSettings, UpdateTenantBranding, GetOnboardingStatus, CompleteOnboarding) with UPSERT on-demand, tenant isolation
Migration 000031 — setup_progress table + must_change_password column on users
Gateway wiring — wireOnboardingTools() registers 8 tools + group:onboarding after stores init
Forced password change — POST /v1/auth/change-password endpoint (PCI DSS, history check, audit), JWT mcp claim, Radix Dialog blocking modal, i18n in 8 languages
E2E tests — full onboarding flow, tenant isolation, change password

Test plan

go vet ./internal/... — zero warnings
go test ./internal/auth/... — all passing
go test ./internal/http/... — 29 tests passing (6 new for change-password)
go test ./internal/tools/... — 38 onboarding tool tests passing (pre-existing path failures on Windows)
go build — cross-compile for Linux succeeds
TypeScript compiles without errors (tsc --noEmit)
Integration tests with real DB (go test -tags integration ./internal/store/pg/... ./tests/onboarding_e2e/...)
Manual: login with temporary password → modal appears → change password → modal closes → normal access

🤖 Generated with Claude Code

Fork independente do GoClaw mantido pela Vellus para o produto ARGO. Rename completo em 444 arquivos: - Go module: github.com/nextlevelbuilder/goclaw → github.com/vellus-ai/argoclaw - Env vars: GOCLAW_* → ARGOCLAW_* - Headers: X-GoClaw-User-Id → X-ArgoClaw-User-Id - Frontend UI strings: GoClaw → ARGO (marca pública) - Docker, scripts, configs: goclaw → argoclaw - OpenAPI spec atualizado Regra de naming: - Backend/código: ArgoClaw - Frontend/público: ARGO - Empresa: Vellus Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sprint 0 — Security hardening before feature development. HIGH fixes: - #1: Whitelist table names in execMapUpdate() — prevents SQL injection via dynamic table name (store/pg/helpers.go) - #2: Log invalid groupBy values in snapshot queries (store/pg/snapshot.go) - #3: Validated shellEscape() — single-quote wrapping is correct; added PBT tests for shell injection (tools/dynamic_tool_security_test.go) MEDIUM fixes: - #4-5: Log security warnings for no-token and viewer-fallback auth (gateway/router.go) - #6: Restrict CORS on OpenAPI endpoint — removed wildcard, allow only localhost origins (http/openapi.go) - #7: Add CheckSSRFWithPinning() for DNS rebinding TOCTOU prevention (tools/web_shared.go) - #8: Log warning when TLS verification is disabled (tracing/otelexport/exporter.go) - #9: Pin all Python package versions in Dockerfile — prevents supply chain attacks via unpinned dependencies - #10: Change HOME fallback from /tmp to /app — prevents temp dir abuse (tools/credentialed_exec.go) Also fixes arargoclaw double-rename bug in 356 Go import paths. Tests: PBT tests for table whitelist and shell escaping (testing/quick). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

security: fix 10 AppSec audit findings (3 HIGH, 7 MEDIUM)

TDD + PBT implementation of email+password authentication: Migration 026: - users table (per-tenant, email unique, Argon2id hash, lockout) - password_history (last 4 passwords, PCI DSS reuse prevention) - user_sessions (JWT refresh tokens, SHA-256 hashed) - login_audit (success/failure/lockout logging) Password validation (PCI DSS): - Minimum 12 characters - Requires: uppercase, lowercase, digit, special character - Rejects passwords containing email local part - History check: prevents reuse of last 4 passwords - Argon2id hashing (OWASP params: 64MB, 3 iterations, 4 threads) - Constant-time hash comparison (crypto/subtle) JWT tokens: - Access token: HS256, 15min expiry, contains uid/email/tid/role - Refresh token: 32 random bytes, SHA-256 hash stored in DB - Round-trip validation with PBT (1000 iterations) Tests (TDD + PBT): - 13 unit tests for password validation - PBT: strong passwords always accepted (5k iterations) - PBT: alpha-only passwords always rejected (5k iterations) - PBT: hash always verifiable (200 iterations) - PBT: JWT round-trip (1k iterations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complete auth flow implementation: Store layer (internal/store): - UserStore interface: CRUD users, password history, sessions, audit - PGUserStore: PostgreSQL implementation with parameterized queries HTTP handlers (POST /v1/auth/*): - /register: email+password, PCI DSS validation, Argon2id hash - /login: constant-time user enumeration prevention, lockout (5 attempts, 30min), audit logging, JWT issuance - /refresh: token rotation (revoke old, issue new) - /logout: session revocation JWT middleware: - Extracts Bearer JWT from Authorization header - Validates and injects claims into context - Sets X-ArgoClaw-User-Id header for backward compatibility - Pass-through for gateway tokens (no dots = not JWT) - RequireUserAuth() wrapper for JWT-only endpoints Security: - Constant-time password check (Argon2id + subtle.ConstantTimeCompare) - User enumeration prevention (burn time on non-existent email) - Account lockout with audit trail - Refresh token rotation (old token revoked on use) - IP + User-Agent logged on all auth events Tests (TDD): - 4 middleware tests (valid JWT, invalid JWT, no token, gateway token) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(auth): PCI DSS email+password authentication

…-label Tenant = company/client (Vellus, Axis, Pitflow), NOT individual user. Migration 027: - tenants table (slug, name, plan, status, Stripe customer ID) - tenant_users (N:N link with role: owner/admin/member) - tenant_branding (logo, favicon, primary color, WCAG AA palette, custom domain, sender email, product name) - Added tenant_id column to: agents, llm_providers, sessions, channel_instances, agent_teams, cron_jobs, custom_tools, mcp_servers, skills - Indexes on all tenant_id columns Store layer: - TenantStore interface: CRUD tenants, membership, branding - PGTenantStore: PostgreSQL with parameterized queries - Updated allowedTables + tablesWithUpdatedAt whitelists Tenant middleware: - Extracts tenant_id from JWT claims - Injects into request context for downstream isolation - RequireTenant() wrapper for tenant-only endpoints - Pass-through for gateway token mode (backward compat) Tests (TDD): - 5 tests: tenant injection, no-JWT pass-through, require tenant rejects/allows, nil when empty Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(tenancy): enterprise multi-tenancy + white-label branding

Three features in one commit: 1. White-label branding (HTTP endpoints): - GET /v1/branding — get tenant branding (logo, colors, domain) - PUT /v1/branding — update branding config - GET /v1/branding/domain/{domain} — resolve branding by custom domain - WCAG AA palette support (JSON field for AI-generated colors) 2. i18n: 5 new backend locales (124 keys each): - pt (Brazilian Portuguese) — primary market - es (Spanish) - fr (French) - it (Italian) - de (German) Total: 8 locales (en, vi, zh + 5 new) 3. ARGO personality presets (replacing GoClaw defaults): - 🚀 Captain (Capitão) — strategic advisor, executive - ⚡ Helmsman (Timoneiro) — operations, project management - 🔍 Lookout (Vigia) — research, analysis - 🎯 Gunner (Artilheiro) — data, finance, KPIs - 🧭 Navigator (Navegador) — legal, compliance, governance - 🛠️ Smith (Ferreiro) — technical, engineering, DevOps Full pt-BR translations included. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: white-label branding + i18n 8 locales + ARGO presets

* merge(upstream): PR nextlevelbuilder#226 from GoClaw community * merge(upstream): PR nextlevelbuilder#314 from GoClaw community * merge(upstream): PR nextlevelbuilder#356 from GoClaw community * merge(upstream): PR nextlevelbuilder#352 from GoClaw community * merge(upstream): GoClaw PR nextlevelbuilder#339 — add curl to Docker runtime image * docs: CHANGELOG ArgoClaw — upstream merges + internal history Track all modifications: 5 upstream GoClaw PRs merged, 3 pending conflict resolution, 6 under review, 2 rejected/skipped. Plus internal Sprint 0 features. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Daily automated check (05:00 BRT) that: - Fetches all open PRs from nextlevelbuilder/goclaw - Classifies by type (security/bug, feature, build/docs) - Tests patch applicability against our ArgoClaw fork - Creates/updates tracking issue with report - Optional Telegram notification (commented, enable later) Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Send daily report via Resend to milton@vellus.tech. This workflow ONLY monitors and reports — no auto-merge. All merges require manual Code Review + AppSec approval. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… + TDD/PBT (#8) Upstream GoClaw PR nextlevelbuilder#316: project-scoped MCP isolation with env overrides. Security hardening (ArgoClaw): - Env var blocklist: blocks 50+ dangerous vars (LD_PRELOAD, PATH, HOME, SHELL, NODE_OPTIONS, PYTHONPATH, GOCLAW_*, POSTGRES_*, etc.) - Prefix blocklist: LD_*, DYLD_*, GOCLAW_*, ARGOCLAW_*, POSTGRES_* - Case-insensitive validation - Immutable field protection: id, created_by, created_at, tenant_id cannot be modified via UpdateProject - tenant_id added to projects table (multi-tenancy) - UNIQUE constraint scoped by tenant_id Tests (TDD + PBT): - 47 unit tests covering all security controls - Property-Based Testing: 2500+ random prefix tests, 1000+ random safe var tests using testing/quick - All tests PASS (verified on VM goclaw-pilot) Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…le system prompt + PBT (#9) Upstream GoClaw PR nextlevelbuilder#343: Anthropic OAuth setup token support. ArgoClaw enhancements: - OAuth system prompt now configurable via oauthSystemPrompt field (default: Claude Code identifier, overridable per-provider) - Prevents forced persona degradation in ARGO agents Tests (TDD + PBT): - 6 tests: 2500+ random inputs via testing/quick - PBT: valid setup tokens always accepted (500 random) - PBT: valid API keys always accepted (500 random) - PBT: random strings always rejected (1000 random) - PBT: short tokens always rejected (500 random) - All tests PASS (verified on VM goclaw-pilot) Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ting + PBT (#10) Upstream GoClaw PR nextlevelbuilder#202: preserve @mentions with underscores in Telegram markdown conversion + bot-to-bot mention routing. ArgoClaw security note: bot routing needs tenant-scoped auth (documented for future sprint). Tests (TDD + PBT): - PBT: single mention preserved (500 random usernames) - PBT: multiple mentions preserved (300 random combinations) - PBT: no false @ injection (500 random texts) - Original: mention preservation test - All tests PASS (verified on VM goclaw-pilot) Conflict resolved: cmd/gateway_consumer.go (handoff + reset) Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…E fix + Zalo QR restart + TDD/PBT (#11) PR nextlevelbuilder#182 (cherry-pick core fixes, without Party Mode): - Sort non-contiguous SSE tool_call indices (prevents nil pointer panic) - Log truncated tool call arguments instead of silently discarding - extractDefaultModel from provider settings JSONB PR nextlevelbuilder#346: - Zalo QR session restart: cancel previous session instead of blocking Tests (TDD + PBT): - Non-contiguous indices: 1000+ PBT random inputs - Truncated JSON arguments: 6 edge cases - All tests PASS (verified on VM goclaw-pilot) PR nextlevelbuilder#350: SKIPPED — core fix (generateId) already in PR nextlevelbuilder#352. Provider listing UX improvements deferred. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d missing symbols - Rename 000027_projects → 000028_projects (conflict with 000027_multi_tenancy) - Bump RequiredSchemaVersion to 28 - Replace all github.com/nextlevelbuilder/goclaw imports with github.com/vellus-ai/argoclaw - Fix missing sessions/providers imports in gateway_consumer.go - Fix LaneDelegate → LaneTeam (renamed in refactor commit 49441f7) - Run go mod tidy to clean upstream dependency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Critical AppSec fix: adds multi-tenant isolation to the PostgreSQL store layer. Previously, the middleware injected tenant_id into context but store queries did not filter by it, allowing cross-tenant data access. Changes: - Add WithTenantID/TenantIDFromContext to store context helpers - Refactor tenant_middleware to use store.WithTenantID (single source) - Add tenantIDFromCtx() and execMapUpdateTenant() helpers in pg package - Fix 7 store files (60+ methods) to filter by tenant_id: - agents.go: all CRUD + shares + access checks (12 methods) - providers.go: all CRUD (6 methods) — API key isolation - channel_instances.go: all CRUD + credentials (8 methods) - mcp_servers.go: all CRUD (5 methods) — server credential isolation - custom_tools.go: all CRUD + list variants (9 methods) - teams.go: CRUD methods (5 methods) - helpers.go: new execMapUpdateTenant with tenant WHERE clause Backwards-compatible: when tenant_id is not in context (uuid.Nil), filters are skipped (single-tenant / gateway token mode). Stores NOT yet fixed (lower priority, no credentials): - cron_crud.go (methods lack ctx parameter — interface change needed) - sessions*.go (session key encodes context, lower risk) - skills*.go (deferred to next sprint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prevents injection and malformed data in tenant branding configuration: - primary_color: must match ^#[0-9A-Fa-f]{6}$ (hex color) - logo_url / favicon_url: must use https:// scheme (prevents javascript: XSS) - sender_email: validated with net/mail.ParseAddress - product_name: max 100 characters Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: resolve duplicate migration 000027 + fix upstream imports

fix(security): enforce tenant_id filtering in all store queries

fix(security): add input validation to branding handler

- Triggers on push to main and manual dispatch - Builds with ENABLE_PYTHON=true for skill support - Pushes to ghcr.io/vellus-ai/argoclaw:latest + SHA tag - Uses Docker layer caching via GitHub Actions cache - Fix Dockerfile ldflags: nextlevelbuilder → vellus-ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ci: Docker build + push to GHCR on main

- jwt_test.go: GenerateRefreshToken returns 3 values (raw, hash, err), not 2 — fix destructuring in TestGenerateRefreshToken_Unique - provider-form-dialog.tsx: add missing isEdit constant (create-only dialog, always false) to fix TS2304 compilation error Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tenant_middleware_test.go referenced undefined ctxKeyTenantID — replaced with store.WithTenantID() which is the actual API used by the middleware. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: resolve CI build errors (Go test + TypeScript)

…kflow - Remove DOCKERHUB_IMAGE env var and all Docker Hub login steps (we use GHCR exclusively, Docker Hub secrets were never configured) - Remove notify-discord job (DISCORD_WEBHOOK_URL secret not configured) - Remove Docker Hub image refs from metadata extraction - Fix ldflags import path: nextlevelbuilder → vellus-ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ci: remove Docker Hub login and Discord notification

* test: add E2E tenant isolation test suite Comprehensive E2E tests for multi-tenancy data isolation covering: - Store-level CRUD cross-tenant isolation (agents, branding, membership) - JWT auth boundary tests (tampering, algorithm confusion, wrong secret) - HTTP API header injection prevention (X-ArgoClaw-User-Id, X-Tenant-Id) - Privilege escalation (admin cross-tenant, self-add, immutable tenant_id) - WebSocket connection isolation (connect, event leak, forged tenant param) - SQL injection payloads against tenant-filtered queries - Property-based testing (PBT) for isolation invariants - Suspended/expired tenant data access policies Includes CI workflow (ci-tenant-isolation.yml) and docker-compose for local test execution with pgvector/pg18. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address code review findings in tenant isolation tests - Remove dead code mustGenerateExpiredToken (misleading: generated valid tokens) - Fix PBT TestPBT_AgentKeyNeverLeaksCrossTenant logic: verify against known Tenant A agent IDs set instead of reverse-querying Tenant A context - Rename TestHTTP_NoAuth_Returns401 to TestHTTP_NoAuth_NoTenantDataLeaked to match actual behavior (gateway token mode may return 200) - Remove local min() function (redundant with Go 1.21+ builtin) - Fix comment httpClientWithToken → httpReqWithToken Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address all remaining code review findings - Add multi-store isolation tests: LLM providers (list, get-by-name), agent teams (list), custom tools (list-all), agents (List method) - Add t.Parallel() to all independent tests for faster CI execution - Fix defer-in-loop in TestWS_MultipleConnections (use t.Cleanup) - Improve TestJWT_InvalidUUID_TenantID assertions: verify ALL invalid payloads fail uuid.Parse, not just "not-a-uuid" - Update migrate image version in docker-compose (v4.17.0 → v4.18.2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#21) * fix(store): add tenant isolation to SessionStore/CronStore/SkillStore Add context.Context parameter to SessionStore and CronStore interface methods that perform DB operations, enabling tenant_id filtering via tenantIDFromCtx(ctx). This prevents cross-tenant data leakage in multi-tenant deployments. SessionStore changes: - GetOrCreate, Delete, List, ListPaged, ListPagedRich, Save, LastUsedChannel now accept ctx as first parameter - All DB queries add AND tenant_id = $N when tid != uuid.Nil - INSERT includes tenant_id column - buildSessionFilter accepts tid for consistent filtering CronStore changes: - AddJob, GetJob, ListJobs, RemoveJob, UpdateJob, EnableJob, GetRunLog, RunJob now accept ctx as first parameter - AddJob INSERT includes tenant_id column - scanJobTenant adds tenant_id filter to single-row lookups - GetRunLog JOINs with cron_jobs for tenant verification - UpdateJob uses execMapUpdateTenant when tenant is present SkillStore changes: - CreateSkillManaged INSERT includes tenant_id column - Added CreateSkillWithCtx, UpdateSkillWithCtx, DeleteSkillWithCtx, ToggleSkillWithCtx for tenant-aware operations - DeleteSkillWithCtx adds tenant_id to SELECT and UPDATE queries BackfillAgentEmbeddings: - Added tenant_id filter to SELECT query when tenant is in context All callers updated to propagate ctx: agent loop, tools, gateway methods, heartbeat ticker, consumer handlers. Backward compatible: when tid == uuid.Nil, no tenant filter is applied. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings in tenant store isolation - scanJobTenant now accepts context.Context and uses QueryRowContext instead of QueryRow, ensuring query cancellation propagation - EnableJob now checks RowsAffected when tenant_id is set, consistent with RemoveJob and session Delete (prevents silent cross-tenant no-op) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…stores (#22) * fix(security): resolve all 6 tenant isolation blockers from code review Fixes all blockers flagged in PR #22 review (score 100/100 each): 1. CronStore.RemoveJob/EnableJob: add ctx + AND tenant_id filter + RowsAffected check 2. CronStore.UpdateJob: add ctx + use execMapUpdateTenant when tenant present 3. SessionStore.Delete: use tenantIDFromCtx(ctx) instead of cache lookup — prevents cross-tenant deletion when session is not in local cache (restart, different node) 4. SessionStore.List: add ctx + filter by tenant via buildSessionFilter 5. sessions.loadFromDB: add tenantID param + AND tenant_id=$2 — prevents cross-tenant session reads via GetOrCreate with a known session key 6. DeleteSkill: add tenant filter to is_system SELECT + RowsAffected check on UPDATE Also fixes 2 warnings (score 75): - testutil: 10*000*1000*1000 = 0ns timeout → 10*time.Second - TestIsolation_Sessions_Delete_CrossTenant: ctxB was unused; now tests adversarial cross-tenant delete (must not delete) followed by same-tenant delete (must delete) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(security): resolve 2 AppSec blockers from code review - sessions.go: getOrInit no longer INSERTs without tenant_id. Sessions without ctx are kept in-memory only; GetOrCreate(ctx) must be called first to persist with the correct tenant_id. Inserting without tenant_id would create orphaned rows that bypass multi-tenant isolation. - sessions_ops.go: Delete now executes the DB DELETE and verifies RowsAffected before evicting cache and calling OnDelete. Previously, cache eviction and media cleanup ran before the DB tenant check, leaving inconsistent state on cross-tenant attempts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…#23) * feat(argoclaw): --non-interactive onboard + OpenTelemetry GenAI instrumentation - Add --non-interactive flag to 'argoclaw onboard' (CI/automated deploy mode) - Reads all inputs from env vars: ARGOCLAW_POSTGRES_DSN (required), ARGOCLAW_GATEWAY_TOKEN and ARGOCLAW_ENCRYPTION_KEY (auto-generated if absent) - Skips all interactive prompts; safe to run with stdin closed - Idempotent: migrations use 'no change' guard, seed is upsert-safe - Add Gemini (gemini_native) to default provider seed list - Add internal/telemetry package: - Setup() — OTLP gRPC exporter, tracer + meter provider - GenAI semantic conventions (AttrGenAI* constants, RecordLLMCall helper) - Graceful noop when OTEL_EXPORTER_OTLP_ENDPOINT not configured - Initialize OTel in gateway startup with deferred graceful shutdown - TDD: 5 unit tests for non-interactive mode + 4 OTel/GenAI tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(otel,onboard): resolve code review blockers for PR #23 - Add nil guard on otelShutdown to prevent panic on Setup failure - Handle error from telemetry.InitMetrics() instead of silently discarding - Read ARGOCLAW_ENVIRONMENT env var for OTel deployment.environment attribute - Document non-overlap between internal/telemetry and internal/tracing/otelexport - Make OTel TLS configurable via OTEL_EXPORTER_OTLP_INSECURE (standard env var) - Use errors.Is(err, migrate.ErrNoChange) instead of string comparison - Protect OTel metric globals with sync.Once; remove metricsInitialized bool - Change RecordLLMCall attrs parameter to pointer so callers can update tokens post-call - Handle errors in onboardWriteEnvFile (return error instead of silently ignoring) - Single-quote all env var values in .env.local to prevent bash special-char expansion - Return (string, error) from onboardGenerateToken; update all callers - Add PBT tests (pgregory.net/rapid) and metric coverage tests for 90% coverage target Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: add Google Artifact Registry to docker-publish workflow - Add GAR as third registry alongside GHCR and Docker Hub - Authenticate via Workload Identity Federation (google-github-actions/auth) - Add id-token: write permission for OIDC - Both build-and-push and build-and-push-web jobs publish to GAR - Requires GCP_WORKLOAD_IDENTITY_PROVIDER and GCP_SERVICE_ACCOUNT secrets Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * sec: fix 3 CVEs — gRPC, jsonparser, imaging - CVE-2026-33186 (CRITICAL): google.golang.org/grpc v1.78.0 -> v1.79.3 Authorization bypass via missing leading slash in :path - GHSA-6g7g-w4f8-9c9x (HIGH): github.com/buger/jsonparser v1.1.1 -> v1.1.2 Denial of service vulnerability - CVE-2023-36308 (LOW): Replace github.com/disintegration/imaging v1.6.2 with golang.org/x/image/draw (stdlib). Panic on malformed images. Rewrote SanitizeImage using image.Decode + draw.CatmullRom. Trivy scan: 0 vulnerabilities across all severities. Closes vellus-ai/vellus-ai-agents-platform#21 Closes vellus-ai/vellus-ai-agents-platform#22 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pipeline DevSecOps concluído: - Build Go (linux/amd64, CGO=0): ✅ 5.1MB - go vet: ✅ sem issues - Testes WebUI (7/7): ✅ ServesIndexHTML, SecurityHeaders, FallbackToIndexHTML... - Imagem: us-central1-docker.pkg.dev/vellus-ai-agent-platform/argoclaw/argoclaw:v1.79.0-webui - Deploy K8s: ✅ rollout concluído - Health check: ✅ https://argo-vellus.consilium.tec.br/ → HTTP 200 CI checks OK: go ✅, web ✅ CI checks ignorados (pré-existentes, não relacionados ao PR): - Tenant Isolation E2E: column 'external_id' não existe no CI (schema gap) - claude-review: Claude Code GitHub App não instalado no repo

Resolves vellus-ai/vellus-ai-agents-platform#33 — rebuild da imagem Docker combinando security patches (v0.1.1-sec) + React SPA embutido (v1.79.0-webui). Análise: - main HEAD (402d322) já contém TUDO: appsec patches (#14, #21, #22) + embed-web-ui - v0.1.1-sec foi buildada antes do merge do embed-web-ui (falta o SPA) - v1.79.0-webui foi buildada do branch (pré-squash), sem as diferenças do commit final - A imagem correta requer build da main HEAD com ENABLE_WEB_UI=true Mudanças: - .github/workflows/docker-publish.yaml: adiciona variante "webui" (-webui suffix) com ENABLE_WEB_UI=true; adiciona campo enable_web_ui a todas as variantes existentes - .github/workflows/rebuild-webui-hardened.yml: workflow dedicado para rebuild imediato (trigger: push neste branch ou workflow_dispatch); produz tag v1.79.1-webui no GAR; documenta os patches de segurança incluídos no job summary Próximo passo: após merge, executar o workflow e atualizar o deployment K8s para v1.79.1-webui. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

google-github-actions/auth@v2 does not generate access_token by default when using Workload Identity Federation. The docker/login-action step requires an access_token to authenticate against GAR. Also configures the GCP WIF pool (github-actions) and SA (sa-github-ci) which were missing from the repository secrets. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…nboard race safety (#28) * fix(store): add SeedOnboardProvider with ON CONFLICT DO NOTHING for onboard race safety Resolves vellus-ai/vellus-ai-agents-platform#32. With replicas >= 2, two initContainers can race to seed placeholder providers. The previous code called CreateProvider (plain INSERT) and silently swallowed duplicate-key errors via slog.Debug — fragile and misleading. Changes: - Add store.ProviderStore.SeedOnboardProvider interface method with doc comment explaining the intentional ON CONFLICT (name, tenant_id) DO NOTHING semantics - Implement SeedOnboardProvider in PGProviderStore with the idempotent INSERT; no DO UPDATE clause ensures user-configured values are never overwritten - Extract seedPlaceholdersWithStore(ctx, store.ProviderStore) from seedOnboardPlaceholders for dependency injection and unit testing - Update both mockProviderStore stubs (internal/http, internal/oauth) to satisfy the updated interface - Add onboard_managed_test.go with 5 tests covering: full seeding, idempotency, api_base skip, error resilience, and PBT never-panics Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(migration): add UNIQUE(name, tenant_id) constraint on llm_providers Migration 027 added tenant_id to llm_providers but did not update the UNIQUE constraint from single-column (name) to composite (name, tenant_id). This caused ON CONFLICT (name, tenant_id) DO NOTHING in SeedOnboardProvider to fail at runtime (Issue nextlevelbuilder#43). - Drop old llm_providers_name_key constraint - Create regular UNIQUE index on (name, tenant_id) for arbiter inference - Create partial UNIQUE index on (name) WHERE tenant_id IS NULL for legacy rows - Bump RequiredSchemaVersion to 29 Resolves vellus-ai/vellus-ai-agents-platform#43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(auth): wire user auth endpoints to gateway server Connect existing auth infrastructure (UserAuthHandler, UserStore, JWT) to the running gateway server. No new logic — pure wiring: - Add JWTSecret to GatewayConfig (env ARGOCLAW_JWT_SECRET, never persisted) - Add Users UserStore to Stores struct + PGUserStore in factory - Add SetUserAuthHandler + route registration in BuildMux - Wire handler creation in cmd/gateway.go (conditional on JWT secret) - Add unit tests for config loading, JWT roundtrip, password validation, password history detection (Gap G2) Endpoints activated: POST /v1/auth/{register,login,refresh,logout} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(gateway): wire JWT middleware globally + add security headers - Apply JWTMiddleware globally in Start() — falls through when no JWT is present, preserving gateway token backward compat - Add securityHeadersMiddleware (Gap G4/RNF-16): HSTS, X-Content-Type-Options, X-Frame-Options, Referrer-Policy, X-XSS-Protection (disabled per OWASP) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(auth): add handler tests for register/login/logout endpoints 8 test cases covering UserAuthHandler: - Register: success (201 + tokens), duplicate email (409), weak password (400) - Login: success (200 + JWT), wrong password (401 + counter), non-existent (401), lockout (429) - Logout: session revoked (200) All tests use in-memory stubUserStore — no database dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(auth): add coverage tests for RefreshToken, HashRefreshToken, VerifyPassword Boost internal/auth coverage from 86.3% to 90.4%: - TestHashRefreshToken: deterministic SHA-256, different inputs - TestGenerateRefreshToken: unique tokens, hash matches - TestVerifyPassword_MalformedHash: malformed and empty hash Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * simplify: remove redundant code from auth-wiring changes - Remove duplicate security headers from WebUIHandler (already set by global securityHeadersMiddleware) - Remove task-tracking comment from securityHeadersMiddleware - Remove redundant pgStores.Users != nil guard (factory always initializes Users) - Move user_auth_test to package http_test for proper black-box isolation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(auth): resolve code review blockers — PBT, refresh tests, edge cases, t.Parallel - Fix os.Unsetenv → t.Setenv for proper test isolation (Blocker #1) - Add 3 tests for handleRefresh: success, invalid token, revoked token reuse (Blocker #2) - Add 3 PBT tests: ValidatePassword properties, JWT roundtrip, HashRefreshToken determinism (Blocker #3) - Add 7 edge case tests: malformed JSON (register/login/refresh/logout), empty body, missing email, email normalization (Blocker #4) - Add t.Parallel() to all independent tests in both files (Blocker #5) - Fix stubUserStore.GetSessionByToken to filter revoked sessions (matches production SQL behavior) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(plugins): Fase 0 — Plugin Host infrastructure (WP-1 through WP-9) Implements the complete plugin host infrastructure for ArgoClaw: ## Schema (WP-1) - Migration 000030: 5 tables (plugin_catalog, tenant_plugins, agent_plugins, plugin_data, plugin_audit_log) with tenant isolation, FK CASCADE, indexes - Bumps RequiredSchemaVersion 29 → 30 ## Store Layer (WP-2, WP-3) - store.PluginStore interface (~20 methods): catalog, lifecycle, agent overrides, KV data, audit log - PGPluginStore implementation with parameterized SQL only (no ORM) - G2 blocker enforced: tenantID always from context, never from caller params - G1 enforced: UninstallPlugin cascades to delete all plugin data atomically - Atomic transactions for Install/Enable/Disable/Uninstall with inline audit - Compile-time interface check: var _ store.PluginStore = (*pg.PGPluginStore)(nil) ## Manifest + Permissions (WP-4) - ParseManifest: validates name (kebab-case), version (semver), transport whitelist - G4 blocker: ValidatePermissions rejects any core:* write scope - PBT via testing/quick: random core:* writes always rejected ## In-Memory Registry (WP-5) - Thread-safe Registry (sync.RWMutex) for runtime plugin state - Names(), ActiveNames(), Count(), List(), Register(), Unregister() ## Data Proxy (WP-6) - DataProxy validates tenant context, collection (max 100), key (max 500) - Enforces plugin-installed check before any store operation - G2: context tenant always wins; never trusts caller-supplied values ## REST API (WP-7, WP-8) - PluginHandler: catalog CRUD, install/uninstall/enable/disable, agent grants - PluginDataHandler: KV data CRUD (list/get/put/delete) - G4 at HTTP boundary: POST /v1/plugins/catalog validates manifest permissions - Auth required on all endpoints (requireAuth pattern) - Conflict 409 on duplicate install, 404 on not found ## Gateway Integration (WP-9) - Lifecycle controller: LoadAll (startup), RegisterPlugin, UnregisterPlugin, Stop - Tool groups registered via tools.RegisterToolGroup("plugin:{name}", ...) - gateway/server.go: SetPluginHandler, SetPluginDataHandler, route registration - cmd/gateway.go: wires plugin handlers when store.Plugins != nil - allowedTables whitelist updated with 5 plugin tables Test summary: - internal/plugins: 95.5% coverage, 50+ tests - internal/http (plugin files): all handlers covered, auth enforced - internal/store/pg: compile-time interface + integration test suite (build tag) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(plugins): resolve code review blockers and AppSec advisories - Replace err.Error() with generic messages in HTTP 500 responses (BLOCKER-1) - Use errors.Is() for sql.ErrNoRows comparison at 6 sites (BLOCKER-2) - Rewrite isUniqueViolation() with errors.As + pgconn.PgError (BLOCKER-2) - Fix UNIQUE constraint on agent_plugins to include tenant_id (ADVISORY-A) - Escape LIKE metacharacters in ListDataKeys prefix (ADVISORY-B) - Add plugin name validation on all HTTP handlers (ADVISORY-C) - Export IsValidPluginName() from plugins package for handler reuse Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(plugins): resolve Round 2 code review blockers - Apply TenantMiddleware to all plugin routes for JWT multi-tenant isolation (BLOCKER-1) - Wire DataProxy into PluginDataHandler, replacing direct store access (BLOCKER-2) - Escape backslash in LIKE replacer for ListDataKeys (BLOCKER-3) - Require admin role for POST /v1/plugins/catalog (BLOCKER-4) - Validate plugin_name with IsValidPluginName in handleGrantAgent/handleInstallPlugin (BLOCKER-5) - Fix checkPluginInstalled to distinguish ErrPluginNotFound from transient errors (BLOCKER-6) - Verify plugin state is "enabled" in checkPluginInstalled (BLOCKER-7) - Add collection/key length validation in HTTP data handlers (BLOCKER-8) - Update tests: inject tenant context, use DataProxy-aware stubs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…dentity * feat(providers): add Vertex AI as LLM provider with OAuth2/Workload Identity Add Google Vertex AI as a native provider using the OpenAI-compatible endpoint. Authentication via Application Default Credentials (ADC) enables zero-secret auth on GKE via Workload Identity. Changes: - OpenAIProvider: add TokenSource support for dynamic OAuth2 tokens - New vertex_ai.go: factory + gcpTokenSource (ADC auto-refresh) - Config: VertexAIConfig struct (project_id, region, default_model) - Store: ProviderVertexAI type for DB-based provider registration - gateway_providers.go: register from config and DB - Thought signature detection for Vertex AI endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): resolve 3 code review blockers for Vertex AI provider 1. OAuth2 scope: cloud-platform → aiplatform (least privilege) 2. Tenant isolation: block DB registration — Vertex AI uses host SA, only host operator can configure via config.json 3. Input validation: regex validation on projectID and region to prevent SSRF via URL injection 4. Tests: rewrite to exercise NewVertexAIProvider end-to-end, add PBT for URL construction invariants, add SSRF validation tests 5. Mutex: release lock before Token() call — oauth2.ReuseTokenSource is already thread-safe, avoids serialization bottleneck Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(providers): simplify Vertex AI — sync.Once, constants, remove false log - Replace sync.Mutex with sync.Once for one-time ADC init (idiomatic, no log-under-lock) - Extract VertexAIDefaultRegion and VertexAIProviderType constants (DRY) - Remove false-positive slog.Info("registered provider") after security block - Use exported constants instead of string literals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(providers): add Vertex AI as LLM provider with OAuth2/Workload Identity Add Google Vertex AI as a native provider using the OpenAI-compatible endpoint. Authentication via Application Default Credentials (ADC) enables zero-secret auth on GKE via Workload Identity. Changes: - OpenAIProvider: add TokenSource support for dynamic OAuth2 tokens - New vertex_ai.go: factory + gcpTokenSource (ADC auto-refresh) - Config: VertexAIConfig struct (project_id, region, default_model) - Store: ProviderVertexAI type for DB-based provider registration - gateway_providers.go: register from config and DB - Thought signature detection for Vertex AI endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): harden auth endpoints — rate limit, tenant isolation, status check, JWT aud 4 AppSec vulnerabilities fixed before exposing /v1/auth/* in production: 1. Rate limiting per-IP on auth endpoints (login: 10/min, register: 5/min, refresh: 20/min) with 429 + Retry-After header. Prevents brute-force. 2. WithTenantID now correctly calls store.WithTenantID(ctx, uuid) instead of store.WithUserID. Fixes critical tenant isolation bypass. 3. Login handler now verifies user.Status == "active" before issuing tokens. Disabled/suspended/pending accounts return 403. Also checked on refresh. 4. JWT tokens now include aud:"argoclaw" claim, validated on parse. Prevents token reuse across unintended services. All fixes include TDD tests (11 new test cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): resolve 3 code review blockers for Vertex AI provider 1. OAuth2 scope: cloud-platform → aiplatform (least privilege) 2. Tenant isolation: block DB registration — Vertex AI uses host SA, only host operator can configure via config.json 3. Input validation: regex validation on projectID and region to prevent SSRF via URL injection 4. Tests: rewrite to exercise NewVertexAIProvider end-to-end, add PBT for URL construction invariants, add SSRF validation tests 5. Mutex: release lock before Token() call — oauth2.ReuseTokenSource is already thread-safe, avoids serialization bottleneck Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(providers): simplify Vertex AI — sync.Once, constants, remove false log - Replace sync.Mutex with sync.Once for one-time ADC init (idiomatic, no log-under-lock) - Extract VertexAIDefaultRegion and VertexAIProviderType constants (DRY) - Remove false-positive slog.Info("registered provider") after security block - Use exported constants instead of string literals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. ipLimiter.allow(): replace time.Time with atomic.Int64 for lastSeen to prevent data race between allow() and cleanupLoop() goroutine 2. Tenant isolation test: use migration 026 users schema (email, password_hash) instead of legacy external_id column Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nagement (#36) /v1/plugins/{name}/data/{collection} conflicted with /v1/plugins/installed/{name}/audit in Go 1.22+ ServeMux. Renamed data proxy routes to /v1/plugin-data/{name}/{collection}[/{key}]. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gRPC expects host:port format without URL schema. When the endpoint is configured as http://host:4317, gRPC appends :443 resulting in "too many colons in address" errors. - Add stripEndpointSchema() to telemetry.Setup() and otelexport.New() - Fix K8s ConfigMap to use host:port without http:// prefix - Fix stale doc comments in plugins_data.go (old /v1/plugins/ paths) Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(auth): add email/password login UI + AppSec hardening Onboarding Phase 2: WebUI email+password authentication. Backend auth was already complete (PR #30). This PR adds the React frontend and hardens the HTTP layer. WebUI: - Email login/signup form with PCI DSS password requirements checklist - JWT auth store (access + refresh tokens in localStorage) - Auth API client (login, register, refresh, logout) - HTTP client auto token refresh on 401 with dedup - Login page defaults to Email tab (Token + Pairing kept as fallback) - i18n: all keys for en, vi, zh locales - Vitest testing infrastructure (23 tests passing) AppSec: - Health endpoint no longer leaks protocol version - General rate limiter (60 rpm per IP) on all HTTP routes - JWT audience comment documenting cross-service binding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(auth): simplify review findings - Extract shared INPUT_CLASS/BUTTON_CLASS to form-styles.ts (DRY) - Add email substring check to password requirements (parity with backend) - Inline JWT audience comment to preserve field alignment - Add reqNoEmail i18n key to all 3 locales Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) * feat(auth): wire JWT auto-refresh + i18n login for pt/es/fr/it/de Phase 2 completion: JWT session management wiring and i18n expansion. - Wire HttpClient.setRefreshFn in WsProvider for silent 401 → refresh → retry - Wire onTokenRefreshed to persist new tokens to auth store - Add useJwtRefresh hook: proactive token renewal 2 min before expiry - Add login.json translations for pt-BR, es-ES, fr-FR, it-IT, de-DE - Register 5 new ARGO product languages in i18n config with EN fallback - Update getInitialLanguage to detect all 8 supported browser languages Build: pnpm build OK | Tests: 23/23 pass | TypeScript strict: OK Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(auth): resolve 5 code review blockers for PR #39 1. Race condition: centralize JWT refresh in singleton (token-refresh.ts) — both proactive timer and reactive 401 handler share same Promise, preventing duplicate refresh calls with rotate-on-refresh tokens. 2. Base64 padding: add padding before atob() in getJwtExp to prevent Firefox failures on JWT payloads with non-multiple-of-4 length. 3. setTimeout overflow: cap timer delay at MAX_SAFE_TIMEOUT (2^31-1 ms) to prevent immediate firing for long-lived tokens. 4. German locale: restore all umlauts (ä, ö, ü, ß) in de/login.json. 5. French locale: restore all accents (é, è, ê, à, ç) in fr/login.json. Tests: 17 new tests (12 getJwtExp + 5 refreshTokenSingleton). Build: pnpm build OK | Tests: 40/40 pass | TypeScript strict: OK Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(i18n): restore diacritics in es/pt/it locales + fix exp falsy check Code review round 2 — resolve 4 blockers: 1. es/login.json: add ñ, ó, é, á, ú, ¿, ¡ (sesión, contraseña, etc.) 2. pt/login.json: add ã, ç, é, á, õ (não, exibição, possível, etc.) 3. it/login.json: add è, à, ù (è già, più, Verrà, etc.) 4. use-jwt-refresh.ts: change `if (!exp)` to `if (exp === null)` to avoid treating exp=0 as falsy (gateway token misdetection) Tests: 40/40 pass | Build: OK Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tools): add 8 onboarding tools for Imediato chat-first setup Add tools that the Imediato (Chief of Staff) agent uses during conversational onboarding to configure the workspace in real-time: 1. configure_workspace — set account type (personal/business), name, industry 2. set_branding — primary color, product name 3. configure_llm_provider — provider + API key + model selection 4. test_llm_connection — validate API key format 5. create_agent — create agent with preset (captain, helmsman, etc.) 6. configure_channel — webchat, telegram, whatsapp, discord, slack 7. complete_onboarding — mark setup done, transition Imediato to CoS mode 8. get_onboarding_status — check what has been configured Architecture: - All tools implement the Tool interface (Name, Description, Parameters, Execute) - OnboardingStore interface for tenant settings persistence - OnboardingStoreAware setter for dependency injection - 15 unit tests covering all tools (TDD) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tools): resolve 9 code review blockers in onboarding tools 1. Fix tenantIDFromCtx: use store.TenantIDFromContext (was AgentIDFromContext) — all 6 store-dependent tools now correctly isolate by tenant 2. Phantom successes eliminated: configure_llm_provider, create_agent, and configure_channel now clearly state they collect info only — user must complete setup via dashboard. No false "encrypted at rest" claims. 3. API key masking: keys are masked to first 4 chars + "***" in all tool results. Full keys never appear in LLM context. 4. Channel tool no longer accepts bot_token as parameter — directs user to dashboard Settings > Channels for secure credential entry. 5. SetBrandingTool: hex color validated via regex (^#[0-9A-Fa-f]{3,6}$), prevents CSS injection. Partial updates preserve unset fields. 6. ConfigureWorkspaceTool: validates account_type against enum, enforces max 255 chars on account_name, trims whitespace. 7. GetOnboardingStatusTool: json.MarshalIndent error now handled explicitly. 8. Tests rewritten with mock OnboardingStore: 38 tests including happy paths, store errors, tenant isolation (2 tenants independent, empty context rejected), PBT for hex color validation (5000 iterations) and API key masking (5000). 9. Removed unused agentStore field from ConfigureLLMProviderTool and CreateAgentTool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add favicon.svg to ui/web/public/ so it's included in the Vite build and served at /favicon.svg by the embedded SPA handler. Co-authored-by: Milton Silva <milton@vellus.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MiltonSilvaJr · 2026-04-06T21:04:21Z

Code Review

Found 4 issue(s):

[BLOCKER] 1. Missing rate limiting on POST /v1/auth/change-password endpoint — Security

The AuthRateLimiter wraps login, register, and refresh but NOT change-password. An attacker with a stolen JWT could brute-force current_password at unlimited speed.

Fix: Add WrapChangePassword to AuthRateLimiter and wrap the handler in RegisterRoutes, matching the pattern of the other auth endpoints.

https://github.com/vellus-ai/argoclaw/blob/5f6a970c822cde6d1aed1f26f920aa62566fde5a/internal/http/user_auth.go#L56-L58

[BLOCKER] 2. Frontend catches HTTP 422 but backend returns 400 for weak passwords — Bug

ChangePasswordModal.tsx maps case 422 to weak password error, but handleChangePassword returns http.StatusBadRequest (400) for all validation errors. Users with weak passwords will see a generic "Server error" instead of the specific validation message.

Fix: Either change the frontend to catch 400 (and distinguish by error message body), or change the backend to return 422 Unprocessable Entity for validation errors to match REST conventions.

https://github.com/vellus-ai/argoclaw/blob/5f6a970c822cde6d1aed1f26f920aa62566fde5a/ui/web/src/components/shared/ChangePasswordModal.tsx#L85-L90

[BLOCKER] 3. PII (email) logged in plaintext on password change — Security/LGPD

slog.Info("security.password_changed", "user_id", user.ID, "email", user.Email) logs email in cleartext. Per CLAUDE.md and LGPD: "Never log passwords, tokens, or PII." The user_id alone is sufficient for correlation.

Fix: Remove "email", user.Email from the slog call, or mask it (user.Email[:3]+"***").

https://github.com/vellus-ai/argoclaw/blob/5f6a970c822cde6d1aed1f26f920aa62566fde5a/internal/http/user_auth.go#L427-L429

[BLOCKER] 4. Silent failure on RevokeAllSessions and AddPasswordHistory — Security

_ = h.users.RevokeAllSessions(...) and _ = h.users.AddPasswordHistory(...) discard errors silently. If RevokeAllSessions fails, old sessions remain valid after password change. If AddPasswordHistory fails, password reuse prevention is weakened (PCI DSS).

Fix: At minimum, log errors with slog.Error. For AddPasswordHistory, consider aborting the password change if history storage fails. For RevokeAllSessions, log at slog.Warn since the password is already changed.

https://github.com/vellus-ai/argoclaw/blob/5f6a970c822cde6d1aed1f26f920aa62566fde5a/internal/http/user_auth.go#L394-L409

Verdict: REQUEST CHANGES
Merge bloqueado até resolução dos 4 itens acima. Issues #1 e #3 são de segurança (AppSec).

Generated with Claude Code

MiltonSilvaJr · 2026-04-06T21:08:03Z

Code Review — Re-check after fixes

Verified commit d86ad8df62a90e0c5f145b6ac1877410ea5f0e44 resolving all 4 blockers:

✅ Rate limiting — WrapChangePassword added to AuthRateLimiter, wired in RegisterRoutes. Same rate as login (10 RPM, burst 3).
✅ Status code mismatch — Frontend now catches 400 (not 422) and displays the server error message directly.
✅ PII in log — Email removed from slog.Info("security.password_changed"). Only user_id logged.
✅ Silent error handling — AddPasswordHistory failure now aborts the operation. ClearMustChangePassword and RevokeAllSessions failures logged at WARN level.

All 6 change-password tests pass. All 23 existing auth tests pass. Build succeeds.

No issues found. Checked: CLAUDE.md compliance, architecture, code quality, tests, security (AppSec), and historical context.

Verdict: APPROVED
Merging into dev.

Generated with Claude Code

…rd change Onboarding Phase 2 connects the 8 existing onboarding tools to PostgreSQL and enables the Imediato agent to guide new customers through workspace setup via chat. Backend: - Migration 000031: setup_progress table + must_change_password on users - PGOnboardingStore: 4 methods (UpdateTenantSettings, UpdateTenantBranding, GetOnboardingStatus, CompleteOnboarding) with UPSERT on-demand - wireOnboardingTools() in gateway bootstrap — registers 8 tools + group - POST /v1/auth/change-password endpoint (PCI DSS, history check, audit) - JWT TokenClaims.MustChangePassword (claim "mcp") for frontend detection - User struct + PG queries updated for must_change_password field Frontend: - ChangePasswordModal (Radix Dialog, blocking, no escape/close) - Auth store decodes JWT mcp claim for mustChangePassword state - auth-client.ts with changePassword() API call - i18n translations in 8 languages (en, pt, es, fr, de, it, vi, zh) Tests: - 12 integration tests for OnboardingStore (TDD, PBT, tenant isolation) - 6 unit tests for change-password endpoint - 4 E2E tests (full flow, tenant isolation, change password) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. Add rate limiting to POST /v1/auth/change-password endpoint (WrapChangePassword on AuthRateLimiter, same rate as login) 2. Fix frontend status code mismatch: catch 400 (not 422) for validation errors, display server error message directly 3. Remove PII (email) from password change log line (LGPD) 4. Handle errors from AddPasswordHistory (abort if fails to preserve PCI DSS reuse prevention), ClearMustChangePassword and RevokeAllSessions (log warnings instead of silent ignore) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MiltonSilvaJr · 2026-04-06T21:11:01Z

Closing — PR should be in vellus-ai/argoclaw, not the upstream fork.

Milton Silva and others added 30 commits March 21, 2026 23:10

Merge pull request #1 from vellus-ai/security/appsec-audit-fixes

34015ad

security: fix 10 AppSec audit findings (3 HIGH, 7 MEDIUM)

Merge pull request #2 from vellus-ai/feat/auth-pci-dss

1fdd0eb

feat(auth): PCI DSS email+password authentication

Merge pull request #3 from vellus-ai/feat/multi-tenancy

4ff0043

feat(tenancy): enterprise multi-tenancy + white-label branding

Merge pull request #4 from vellus-ai/feat/whitelabel-i18n-presets

4b0e4a4

feat: white-label branding + i18n 8 locales + ARGO presets

docs: update CHANGELOG with PRs #8,#9,#10 (TDD/PBT verified)

faaa07c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: CHANGELOG — all conflict PRs resolved (#11)

761161c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #12 from vellus-ai/fix/migration-duplicate-027

9f46eb4

fix: resolve duplicate migration 000027 + fix upstream imports

Merge pull request #13 from vellus-ai/fix/appsec-tenant-isolation

8e97029

fix(security): enforce tenant_id filtering in all store queries

Merge pull request #14 from vellus-ai/fix/appsec-branding-validation

09aaeca

fix(security): add input validation to branding handler

Merge pull request #15 from vellus-ai/ci/docker-build-push

4151f55

ci: Docker build + push to GHCR on main

fix: use store.WithTenantID in tenant middleware test

cde2ac8

tenant_middleware_test.go referenced undefined ctxKeyTenantID — replaced with store.WithTenantID() which is the actual API used by the middleware. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #17 from vellus-ai/claude/fix/ci-build-errors

a177282

fix: resolve CI build errors (Go test + TypeScript)

Milton Silva and others added 22 commits March 25, 2026 12:05

Merge pull request #19 from vellus-ai/claude/fix/release-workflow

b3a911b

ci: remove Docker Hub login and Discord notification

MiltonSilvaJr changed the base branch from dev to main April 6, 2026 21:08

Milton Silva and others added 2 commits April 6, 2026 18:08

MiltonSilvaJr force-pushed the claude/feat/onboarding-phase2-impl branch from d86ad8d to a71badd Compare April 6, 2026 21:08

MiltonSilvaJr closed this Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Onboarding Phase 2 — chat-first setup + forced password change#733

feat: Onboarding Phase 2 — chat-first setup + forced password change#733
MiltonSilvaJr wants to merge 59 commits intonextlevelbuilder:mainfrom
vellus-ai:claude/feat/onboarding-phase2-impl

MiltonSilvaJr commented Apr 6, 2026

Uh oh!

MiltonSilvaJr commented Apr 6, 2026

Uh oh!

MiltonSilvaJr commented Apr 6, 2026

Uh oh!

MiltonSilvaJr commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MiltonSilvaJr commented Apr 6, 2026

Summary

Test plan

Uh oh!

MiltonSilvaJr commented Apr 6, 2026

Code Review

Uh oh!

MiltonSilvaJr commented Apr 6, 2026

Code Review — Re-check after fixes

Uh oh!

MiltonSilvaJr commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant