|
| 1 | +--- |
| 2 | +phase: 105.1-fix-codegen-abi-issues-and-workarounds-from-phase-105 |
| 3 | +verified: 2026-02-17T05:30:00Z |
| 4 | +status: gaps_found |
| 5 | +score: 6/9 must-haves verified |
| 6 | +gaps: |
| 7 | + - truth: "POST /api/v1/events endpoint works without segfault" |
| 8 | + status: failed |
| 9 | + reason: "EventProcessor service call crashes with SIGSEGV during asynchronous background processing after the HTTP 429 response is sent. Documented in STATE.md blockers and 105.1-03-SUMMARY.md verification results table. The reply conversion fix (Plan 02) did not resolve this crash path." |
| 10 | + artifacts: |
| 11 | + - path: "mesher/mesher" |
| 12 | + issue: "Binary rebuilt but EventProcessor service call still segfaults (exit code 139) during background event processing" |
| 13 | + missing: |
| 14 | + - "Root cause investigation of EventProcessor service loop crash -- state struct layout, tuple encoding/decoding, or call dispatch in the process_event return path" |
| 15 | + - "Fix for the remaining SIGSEGV before the event ingestion pipeline can be declared working" |
| 16 | + - truth: "EventProcessor.process_event service call completes successfully" |
| 17 | + status: failed |
| 18 | + reason: "route_to_processor calls EventProcessor.process_event synchronously but the service crashes during processing. The Plan 02 type-aware reply conversion (inttoptr for SumType returns) did not eliminate the segfault in this specific path. The handler returns (ProcessorState, String!String) -- a tuple with a large struct state and a sum type reply -- and this combination may have an unresolved ABI issue in the service loop or tuple state extraction." |
| 19 | + artifacts: |
| 20 | + - path: "mesher/services/event_processor.mpl" |
| 21 | + issue: "ProcessEvent handler returns (ProcessorState, String!String); ProcessorState is a struct {pool: PoolHandle, processed_count: Int} -- the state extraction path may have issues with this specific struct" |
| 22 | + missing: |
| 23 | + - "Verify service loop state extraction for ProcessorState specifically (PoolHandle is a pointer, processed_count is Int -- combined struct may hit an edge case)" |
| 24 | + - "Confirm the inttoptr fix in codegen_service_call_helper reaches the EventProcessor call site at runtime" |
| 25 | + - truth: "All other Mesher endpoints continue to work" |
| 26 | + status: partial |
| 27 | + reason: "Health endpoint (GET) and WebSocket upgrade confirmed working. Issue management endpoints not explicitly verified. POST /api/v1/events fails with SIGSEGV so this truth is only partially satisfied." |
| 28 | + artifacts: |
| 29 | + - path: "mesher/mesher" |
| 30 | + issue: "Only health and WebSocket endpoints explicitly verified; event ingestion crashes" |
| 31 | + missing: |
| 32 | + - "Human verification of issue management endpoints (resolve, archive, assign, etc.)" |
| 33 | +human_verification: |
| 34 | + - test: "POST /api/v1/events with valid API key" |
| 35 | + expected: "HTTP 202 accepted response without process crash (SIGSEGV exit 139)" |
| 36 | + why_human: "Requires live PostgreSQL and running Mesher binary; cannot verify from static code inspection" |
| 37 | + - test: "Issue management endpoints (POST /api/v1/issues/:id/resolve, archive, assign)" |
| 38 | + expected: "HTTP 200 responses, no crashes" |
| 39 | + why_human: "Not covered in phase 03 verification table; needs live test" |
| 40 | +--- |
| 41 | + |
| 42 | +# Phase 105.1: Fix Codegen ABI Issues Verification Report |
| 43 | + |
| 44 | +**Phase Goal:** Fix struct-in-Result sum type layout mismatch and service call reply serialization so that all Mesher endpoints work correctly, including the event ingestion pipeline that was crashing due to ABI segfaults. |
| 45 | +**Verified:** 2026-02-17T05:30:00Z |
| 46 | +**Status:** GAPS FOUND |
| 47 | +**Re-verification:** No -- initial verification |
| 48 | + |
| 49 | +## Goal Achievement |
| 50 | + |
| 51 | +### Observable Truths |
| 52 | + |
| 53 | +| # | Truth | Status | Evidence | |
| 54 | +|----|-------|--------|----------| |
| 55 | +| 1 | Result<Struct, String> values can be constructed and destructured without segfault | VERIFIED | `codegen_construct_variant` in expr.rs L1882-1911 heap-allocates struct payloads >8 bytes via `mesh_gc_alloc_actor`, stores pointer in ptr slot. Commit a3e1a8f9. | |
| 56 | +| 2 | Sum type variants with struct payloads use pointer-boxing to fit the {i8, ptr} layout | VERIFIED | Pattern confirmed at expr.rs L1883-1911: struct_size > 8 check, variant_box alloca, heap_ptr stored. No pattern.rs changes needed (justified deviation). | |
| 57 | +| 3 | All existing tests still pass after the fix | VERIFIED | Summary 105.1-02 confirms 176 mesh-codegen tests, 509 mesh-rt tests, 11 concurrency e2e tests, 90+ meshc tests all pass. | |
| 58 | +| 4 | Service call handlers returning tuples with complex types serialize and deserialize correctly | PARTIAL | The type-aware inttoptr fix is present (expr.rs L4053-4064). Scalar types return raw i64; SumType/Struct/String/Ptr use inttoptr. The EventProcessor case still crashes -- the fix is structurally correct but the specific EventProcessor code path hits an unresolved issue. | |
| 59 | +| 5 | The service loop reply mechanism handles >8-byte reply values via pointer indirection | VERIFIED | State extraction fix confirmed at expr.rs L3681-3705: small structs (<=8 bytes) use alloca bitcast, large structs (>8 bytes) use inttoptr+load. Matches tuple encoding strategy. | |
| 60 | +| 6 | The service call caller correctly recovers complex reply types from the reply message | PARTIAL | inttoptr path confirmed in code (L4055-4058). But EventProcessor service call still crashes at runtime -- the recovery path is not fully validated for the (ProcessorState, String!String) return tuple. | |
| 61 | +| 7 | authenticate_request returns Project!String (not String!String workaround) | VERIFIED | auth.mpl L29: `pub fn authenticate_request(pool :: PoolHandle, request) -> Project!String`. Imports `get_project_by_api_key`. Commit f9f74455. | |
| 62 | +| 8 | POST /api/v1/events endpoint works without segfault | FAILED | 105.1-03-SUMMARY.md verification table row: POST /api/v1/events (raw key) = "PASS (auth), FAIL (background SIGSEGV)". STATE.md blockers section explicitly lists this as an open issue. | |
| 63 | +| 9 | EventProcessor.process_event service call completes successfully | FAILED | 103-SUMMARY.md: "EventProcessor service call: STILL CRASHING -- exit code 139 (SIGSEGV) during asynchronous background processing after the HTTP response is sent." | |
| 64 | + |
| 65 | +**Score:** 6/9 truths verified (2 failed, 1 partial) |
| 66 | + |
| 67 | +### Required Artifacts |
| 68 | + |
| 69 | +| Artifact | Expected | Status | Details | |
| 70 | +|----------|----------|--------|---------| |
| 71 | +| `crates/mesh-codegen/src/codegen/expr.rs` | Pointer-boxing for struct payloads in codegen_construct_variant | VERIFIED | Function at L1828, boxing logic at L1882-1911. Contains `codegen_construct_variant` and `variant_box`. | |
| 72 | +| `crates/mesh-codegen/src/codegen/expr.rs` | Fixed service loop reply serialization (codegen_service_loop) | VERIFIED | `codegen_service_call_helper` at L3965, reply conversion at L4053-4064. | |
| 73 | +| `crates/mesh-codegen/src/codegen/types.rs` | Sum type layout test `test_sum_type_layout_struct_payload` | VERIFIED | Function at types.rs L373. Contains `create_sum_type_layout`. Commit 7b68efe0. | |
| 74 | +| `crates/mesh-codegen/src/codegen/mod.rs` | IR-level tests for struct-in-Result boxing | VERIFIED | `test_struct_in_result_pointer_boxing` at L1839, `test_struct_in_result_construct_and_match_ir` at L1925. | |
| 75 | +| `crates/meshc/tests/e2e_stdlib.rs` | E2E runtime test `e2e_struct_in_result_roundtrip` | VERIFIED | Function at e2e_stdlib.rs L1653. | |
| 76 | +| `tests/e2e/struct_in_result_roundtrip.mpl` | Mesh fixture for construct+match roundtrip | VERIFIED | File exists at tests/e2e/struct_in_result_roundtrip.mpl | |
| 77 | +| `mesher/ingestion/auth.mpl` | authenticate_request returning Project!String | VERIFIED | L29: `-> Project!String`, imports `get_project_by_api_key`. Workaround comment removed. | |
| 78 | +| `mesher/ingestion/routes.mpl` | Event ingestion using project.id | VERIFIED | L202: `Ok(project) -> handle_event_sampled(pool, project.id, ...)`. L254: same in handle_bulk. | |
| 79 | +| `mesher/mesher` | Rebuilt Mesher binary | EXISTS | Binary at mesher/mesher, commit f9f74455 includes binary. EventProcessor still crashes at runtime. | |
| 80 | + |
| 81 | +### Key Link Verification |
| 82 | + |
| 83 | +| From | To | Via | Status | Details | |
| 84 | +|------|----|-----|--------|---------| |
| 85 | +| `codegen_construct_variant` | `create_sum_type_layout` | Variant overlay must match sum type layout | VERIFIED | expr.rs uses `{i8, ptr}` layout from `sum_type_layouts` map. Pointer-boxing ensures stored value fits ptr slot. | |
| 86 | +| `codegen_service_loop (reply send)` | `codegen_service_call_helper (reply receive)` | mesh_service_reply/mesh_service_call message buffer | VERIFIED (structure) | Reply is sent as 8-byte i64 in service loop (L3629 area); received and converted via inttoptr at L4055-4058 for complex types. Code structure correct. Runtime behavior for EventProcessor still fails. | |
| 87 | +| `mesher/ingestion/routes.mpl` | `EventProcessor.process_event` | service call | WIRED | L152: `let result = EventProcessor.process_event(processor_pid, project_id, writer_pid, body)`. Pattern match on Ok/Err at L153-156. | |
| 88 | +| `mesher/ingestion/auth.mpl` | `Storage.Queries.get_project_by_api_key` | direct function call returning Project struct | WIRED | L4: `from Storage.Queries import get_project_by_api_key`. L32: `get_project_by_api_key(pool, key)`. | |
| 89 | + |
| 90 | +### Requirements Coverage |
| 91 | + |
| 92 | +Not mapped to formal requirements -- verified against plan must_haves directly. |
| 93 | + |
| 94 | +### Anti-Patterns Found |
| 95 | + |
| 96 | +| File | Line | Pattern | Severity | Impact | |
| 97 | +|------|------|---------|----------|--------| |
| 98 | +| `mesher/services/event_processor.mpl` | 97-99 | ProcessEvent call handler returns `(ProcessorState, String!String)` where ProcessorState contains PoolHandle | Warning | PoolHandle is a pointer field; combined with processed_count Int, the struct tuple may interact with an unresolved edge case in service loop state extraction | |
| 99 | + |
| 100 | +No TODO/FIXME/placeholder comments found in modified codegen files. |
| 101 | + |
| 102 | +### Human Verification Required |
| 103 | + |
| 104 | +#### 1. POST /api/v1/events with valid credentials |
| 105 | + |
| 106 | +**Test:** Start Mesher binary. POST to /api/v1/events with a valid x-sentry-auth header containing a real API key from the database. |
| 107 | +**Expected:** HTTP 202 accepted response. No SIGSEGV crash. Mesher process continues running after the request. |
| 108 | +**Why human:** Requires live PostgreSQL with seeded data, running Mesher binary, and network access. Cannot verify from static code analysis. The crash occurs asynchronously after the HTTP response is sent. |
| 109 | + |
| 110 | +#### 2. Issue management endpoints |
| 111 | + |
| 112 | +**Test:** POST to /api/v1/issues/:id/resolve, /archive, /assign with a valid issue ID. |
| 113 | +**Expected:** HTTP 200 with status:ok JSON. No crashes. |
| 114 | +**Why human:** Not covered in the phase 03 verification table. These endpoints use different code paths (direct DB queries, not service calls) so they likely work, but were not explicitly tested. |
| 115 | + |
| 116 | +### Gaps Summary |
| 117 | + |
| 118 | +The phase achieved its codegen fixes (Plans 01 and 02) and successfully reverted the auth workaround (Plan 03). The struct-in-Result pointer-boxing fix works end-to-end for the auth pipeline. However, the core phase goal -- "all Mesher endpoints work correctly, including the event ingestion pipeline" -- is NOT achieved. |
| 119 | + |
| 120 | +The EventProcessor service call continues to crash with SIGSEGV (exit code 139) during asynchronous background processing. This is explicitly documented in the 105.1-03-SUMMARY.md verification results table and in STATE.md under "Blockers/Concerns". The POST /api/v1/events endpoint sends an HTTP response first (429 rate limited, or 202 accepted if within limits), then crashes in the background EventProcessor call. |
| 121 | + |
| 122 | +Root cause is unknown but the 103-SUMMARY notes: "The service call reply conversion fix may not cover all code paths in the EventProcessor pipeline, or there may be an additional ABI issue in the service loop state management." The EventProcessor handler returns `(ProcessorState, String!String)` where ProcessorState is `{pool: PoolHandle, processed_count: Int}` -- a large struct that goes through the inttoptr+load state extraction path. This path was added in Plan 02 but may have an unresolved edge case. |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +_Verified: 2026-02-17T05:30:00Z_ |
| 127 | +_Verifier: Claude (gsd-verifier)_ |
0 commit comments