Skip to content

[BUG] Investigate L2BlockNumberLessThanNextBlockNumber revert during settlement on spec-13-op (aggchainData vs L1 mismatch not detected by agglayer-node / PP) #1431

@vcastellm

Description

@vcastellm

Summary

A settlement on the newly created FEP network spec-13-op reverted on-chain with L2BlockNumberLessThanNextBlockNumber() (4byte: 0x541d595b). agglayer-node / proposer / proof pipeline produced a certificate and PP that passed node-side verification but was rejected by L1 contracts. After cleaning proposer/aggsender DBs, the certificate settled successfully — suggesting transient/inconsistent node state or input mismatch. We need to investigate why agglayer-node/PP did not detect an L1-semantic mismatch (L2 block number / aggchainData) and harden the pipeline.


Context / important facts

  • Network: spec-13-op (network_id: 13) — created with contracts v12.2.0.
  • Certificate that failed: height 3, certificate_id 0x4b5792b9d57a41be620b1867f5be7073931437bf204d169f9c3b8e94cc16b26c.
  • Contract revert payload: execution reverted, data: "0x541d595b"
    decode: cast 4byte 0x541d595bL2BlockNumberLessThanNextBlockNumber().
  • After cleaning op-succinct-proposer + aggsender DBs, the settlement succeeded for the same certificate.
  • agglayer-node & PP treat aggchainData / aggchain-params mostly as opaque; L1 enforces semantic constraints (e.g., nextBlockNumber), so node-level checks currently miss some contract-level mismatches.
  • There were also RPC trace rate-limit errors (HTTP 429) in proposer logs prior to switching RPC provider to Tenderly.

Related links:


Observed behavior

  • agglayer-node / aggchain-proof / pessimistic-proof pipeline allowed generation/verification of proofs/certificate that include aggchainData whose L2 block number is less than the contract's nextBlockNumber().
  • The settlement transaction reverted on-chain with L2BlockNumberLessThanNextBlockNumber() when calling the rollup contract.
  • After DB cleanup, the settlement with the same certificate completed successfully.

Impact

  • Certificates may be produced and PPs built that will fail on-chain, causing stuck settlements and operational disruption.
  • Pipeline does not provide early detection of contract-level input semantics mismatches (L2 block number, signer set, threshold, aggchain_hash, etc.).
  • Risk to Phase 2 correctness if similar mismatches occur undetected in production.

Investigation goals / questions

  1. Confirm whether the certificate contained aggchainData with _l2BlockNumber < nextBlockNumber() at time of revert.
  2. Identify where stale or inconsistent state came from (aggsender/proposer DB state, race conditions, or agglayer-node behavior).
  3. Understand why aggchain-proof and pessimistic-proof did not detect the mismatch.
  4. Determine how DB cleanup fixed the problem and which state changes mattered.
  5. Propose and test mitigations to detect such mismatches before on-chain settlement.

Actionable investigation steps

  1. Collect artifacts from the failing run:
    • Attach op_succinct_db.sql and aggsender DB snapshot taken before cleanup.
    • Export raw certificate calldata / aggchainData, aggchain_params, aggchain_hash, l1_info_tree_leaf_count, prev_ler, prev_pp_root, etc.
    • Proposer / agglayer-node logs around failure time (include any 429 traces).
    • Exact RPC responses and cast outputs used (4byte decode shown above).
  2. Query L1 contract state at the revert time:
    • latestBlockNumber(), submissionInterval(), nextBlockNumber(), and any rollup-specific state.
  3. Replay the settlement:
    • Replay the settlement call (same calldata) against a local shadow-fork of L1 to reproduce the revert.
    • Replay the aggsender → aggkit-prover → proposer flow with the same DB snapshot to find where the outdated _l2BlockNumber got introduced.
  4. Inspect pipeline components:
    • Confirm aggsender's logic for choosing L2 block ranges and check for off-by-one or stale reads.
    • Verify aggkit-prover inputs and that proof generation uses the same block range.
    • Verify PP generation path and confirm which fields are treated as opaque.
  5. Check rate-limiting handling:
    • Review proposer usage of RPC endpoints and debug endpoints; ensure tracing fallbacks/backoffs and don't cause inconsistent behavior.
  6. Evaluate mitigations and PoC:
    • Implement a pre-PP validation step in agglayer-node that verifies public input semantics (e.g., compare encoded L2 block number to contract nextBlockNumber() via read-only call).
    • Or run a dry-run on a shadow-fork with a mock verifier or contract helper (e.g., checkInputsValidity(...)) that early-fails for semantics mismatches.
    • Add detection for aggchain_hash mismatches before PP generation.
  7. Add tests and monitoring:
    • Unit/integration tests for L2 block number mismatches and aggchain_hash mismatches.
    • Alerting for certificates rejected on-chain with these revert signatures.

Artifacts to attach

  • Pre-cleanup DB snapshots:
    • op_succinct_db.sql (Thiago confirmed snapshot available)
    • aggsender DB snapshot
  • Raw certificate calldata(s) for failing certificate(s)
  • Proposer / agglayer-node logs (Datadog link + extracted logs)
  • Any commands/outputs used (casts, 4byte decodes)
  • Configs used to spin up spec-13-op (gke-shared-dev-configs PR feat: e2e test bridge #234)

Acceptance criteria

  • Root cause identified and documented (e.g., stale aggsender state, race, missing input validation).
  • Reproduction steps that consistently reproduce the revert in a local environment.
  • Short-term mitigation implemented or PR opened (pre-PP validation or shadow-fork dry-run).
  • Tests added to prevent regression (unit/integration).
  • Improvements to proposer's RPC/tracing handling to avoid failures caused by rate-limiting.

Suggested checklist (to paste into the issue body)

  • Attach DB snapshots (op_succinct, aggsender) taken before cleanup
  • Attach proposer/agglayer-node logs and Datadog link
  • Identify where stale/incorrect L2 block number was introduced
  • Implement PoC mitigation (pre-PP validation or shadow-fork dry-run)
  • Add unit/integration tests to prevent regressions
  • Open PR(s) for mitigation changes
  • Close issue when mitigation is merged and tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions