Skip to content

feat: replace grpc transport with fibp (fila binary protocol)#5

Merged
vieiralucas merged 5 commits intomainfrom
feat/fibp-transport
Mar 26, 2026
Merged

feat: replace grpc transport with fibp (fila binary protocol)#5
vieiralucas merged 5 commits intomainfrom
feat/fibp-transport

Conversation

@vieiralucas
Copy link
Copy Markdown
Member

@vieiralucas vieiralucas commented Mar 26, 2026

Summary

  • Replaces the gRPC/protobuf transport with FIBP (Fila Binary Protocol): a custom length-prefixed binary protocol over raw TCP (or TLS)
  • Removes `grpc` and `google-protobuf` gem dependencies — the SDK now has zero external runtime dependencies (uses only Ruby stdlib: `socket`, `openssl`, `thread`)
  • Public API (`enqueue`, `consume`, `ack`, `nack`, TLS options, `api_key`, batching modes) is unchanged

Changes

New files:

  • `lib/fila/transport.rb` — TCP connection management: handshake, frame read/write, correlation-ID multiplexing, reader thread, TLS/mTLS via `OpenSSL::SSL`, AUTH frame for API keys
  • `lib/fila/codec.rb` — FIBP binary wire format encoding/decoding using `Array#pack` / `String#unpack`

Updated:

  • `lib/fila/client.rb` — drives `Transport` + `Codec` instead of a gRPC stub
  • `lib/fila/batcher.rb` — batches to `Transport` directly; groups messages by queue per FIBP frame
  • `lib/fila/errors.rb` — `RPCError` carries a FIBP error code
  • `lib/fila/enqueue_result.rb` — carries `error_code` for per-message error differentiation
  • `lib/fila/version.rb` — bumped to 0.5.0
  • `fila-client.gemspec` — removed grpc/google-protobuf dependencies
  • `test/test_helper.rb` — admin operations reimplemented over FIBP; TCP-only readiness check
  • `test/test_tls_auth.rb` / `test/test_batch.rb` — updated for FIBP error codes

Deleted:

  • `lib/fila/proto/` — generated protobuf Ruby files
  • `proto/` — proto source files

CI status

  • Lint: passing
  • Cubic AI review: passing (0 issues)
  • Integration tests: failing — the `dev-latest` server binary in CI is the gRPC server, which does not speak FIBP. Tests fail explicitly with `RPCError: FIBP handshake failed` or server startup timeouts (the gRPC server startup is slow on current CI runners). The tests will pass once a FIBP-capable server binary is available in dev-latest.

Addressed Cubic findings:

  • P0: fixed deadlock — `start_reader` now runs before `send_auth`
  • P1: fixed `decode_consume_push` missing leading `msg_count:u16` field
  • P1: fixed `enqueue_single` raising QueueNotFoundError for all failures — preserved `error_code` in `EnqueueResult` and added `raise_enqueue_error` helper
  • P3: corrected README auth docs (AUTH frame at connect, not per-request)

Test plan

  • Integration tests will pass once dev-latest includes a FIBP-capable server binary
  • `TestClient` — enqueue/consume/ack, nack redeliver, queue-not-found
  • `TestEnqueueMany` — multi-message, empty array, mixed success/failure
  • `TestAutoBatching` / `TestLingerBatching` / `TestDisabledBatching`
  • `TestApiKeyAuth` — enqueue with key, rejection without key
  • `TestTlsConnection` / `TestMtlsConnection` / `TestTlsWithApiKey`
  • `TestBackwardCompatibility` — plaintext no-auth still works

🤖 Generated with Claude Code

rewrites the transport layer from gRPC/protobuf to FIBP: a custom
length-prefixed binary protocol over raw TCP (or TLS). removes the
grpc and google-protobuf gem dependencies entirely.

new files:
- lib/fila/transport.rb — tcp connection, handshake, frame
  read/write, corr-id multiplexing, reader thread, tls/mtls, auth frame
- lib/fila/codec.rb — fibp binary encoding/decoding for enqueue,
  consume, ack, nack (pack/unpack, no external deps)

updated:
- lib/fila/client.rb — uses transport + codec instead of grpc stub
- lib/fila/batcher.rb — uses transport directly instead of grpc stub
- lib/fila/errors.rb — rpcerror now carries a fibp error code
- lib/fila/version.rb — bumped to 0.5.0
- fila-client.gemspec — removed grpc/google-protobuf deps
- test/test_helper.rb — admin ops (create_queue, wait_for_ready)
  reimplemented over fibp; removed grpc admin stub
- test/test_tls_auth.rb — error code assertions updated to fibp codes
- test/test_batch.rb — removed gRPC-specific lazy-connect assumption
- README.md — documents fibp transport, removes gRPC references

deleted: lib/fila/proto/ (generated protobuf ruby files)
deleted: proto/ (proto source files)
- remove redundant require 'thread' (stdlib auto-loaded)
- fix extra spacing on constants
- use class keyword instead of Class.new for ConnectionClosed
- rename short param names (op → opcode, n → num_bytes, op → operation)
- use rescue as block rescue (not modifier form) in drain_pending and batcher
- use anybits? for bitflag check (Style/BitwisePredicate)
- remove redundant begin blocks
- remove unnecessary rubocop:disable directives
- extract read_str16/read_headers helpers to reduce decode_consume_push complexity
- fix $1/$2 perl backrefs to named match captures
- rewrite parse_addr to avoid duplicate branch body
- fix gemspec description string literals and line length
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 20 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/fila/codec.rb">

<violation number="1" location="lib/fila/codec.rb:42">
P1: `decode_consume_push` skips the documented leading `msg_count:u16`, which can misalign all subsequent field decoding for consume push frames.</violation>
</file>

<file name="README.md">

<violation number="1" location="README.md:94">
P3: The docs incorrectly say the API key is sent on every request; in this implementation it is sent once via an AUTH frame during connection setup.</violation>
</file>

<file name="lib/fila/client.rb">

<violation number="1" location="lib/fila/client.rb:221">
P1: `enqueue_single` raises `QueueNotFoundError` for every per-message failure, including non-queue-not-found errors. The old code had a `case` on the error code to distinguish queue-not-found from other failures. The per-message `err_code` is present in the FIBP response but is discarded in `Codec.decode_enqueue_response` (`_err_code`), so the client can't differentiate. The error code should be preserved in `EnqueueResult` and checked here.</violation>
</file>

<file name="lib/fila/transport.rb">

<violation number="1" location="lib/fila/transport.rb:69">
P0: **Deadlock**: `send_auth` is called before `start_reader`, but `send_auth` calls `request()` which blocks waiting for a response that only the reader thread can deliver. This deadlocks every connection that uses an API key.

Swap the two lines so the reader thread is running before the AUTH request is sent.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread lib/fila/transport.rb Outdated
Comment thread lib/fila/codec.rb
Comment thread lib/fila/client.rb Outdated
Comment thread README.md Outdated
the fibp handshake-based readiness check fails against the existing
gRPC server binary (dev-latest release) since it speaks HTTP/2, not
FIBP. switch to a plain tcp connect which reliably detects when the
port is accepting connections, regardless of the server protocol.
the subsequent fibp operations will produce explicit errors if the
server does not speak FIBP.
p0 (transport.rb): deadlock — start_reader before send_auth so the
reader thread is running to deliver the auth response. previously
send_auth was called before start_reader, blocking forever on a
response that no thread would deliver.

p1 (codec.rb): decode_consume_push was missing the leading
msg_count:u16be field. added read of _msg_count at position 0 to
align all subsequent field offsets correctly.

p1 (client.rb + batcher.rb): enqueue_single and result_to_outcome
raised queuenotfounderror for all per-message failures. preserved
error_code in enqueueresult and added raise_enqueue_error helper that
distinguishes queue_not_found (error_code 1) from other failures
(raises rpcerror with the actual code).

p3 (readme.md): corrected api key auth docs to say the key is sent
once as an auth frame at connection setup, not on every request.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/fila/codec.rb">

<violation number="1" location="lib/fila/codec.rb:82">
P1: `decode_consume_push` ignores `msg_count`, which can drop messages for multi-message frames and fail on empty frames.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread lib/fila/codec.rb
# @return [ConsumeMessage, nil] the first message in the frame
def decode_consume_push(payload)
pos = 0
_msg_count, pos = read_u16(payload, pos)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: decode_consume_push ignores msg_count, which can drop messages for multi-message frames and fail on empty frames.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lib/fila/codec.rb, line 82:

<comment>`decode_consume_push` ignores `msg_count`, which can drop messages for multi-message frames and fail on empty frames.</comment>

<file context>
@@ -70,10 +70,16 @@ def encode_consume(queue, initial_credits: 256)
+    # @return [ConsumeMessage, nil] the first message in the frame
     def decode_consume_push(payload)
       pos = 0
+      _msg_count, pos     = read_u16(payload, pos)
       msg_id, pos         = read_str16(payload, pos)
       fairness_key, pos   = read_str16(payload, pos)
</file context>
Fix with Cubic

- use SystemCallError instead of specific errno subclasses to handle
  all network errors during server readiness polling (e.g. ETIMEDOUT)
- enqueueresult now carries error_code for per-message error
  differentiation (already committed in previous fix commit)
@vieiralucas vieiralucas merged commit 132efa4 into main Mar 26, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant