Skip to content

feat(fibre): implement Fibre Module#4892

Open
vgonkivs wants to merge 12 commits intomainfrom
add_fibre_module
Open

feat(fibre): implement Fibre Module#4892
vgonkivs wants to merge 12 commits intomainfrom
add_fibre_module

Conversation

@vgonkivs
Copy link
Copy Markdown
Member

@vgonkivs vgonkivs commented Mar 25, 2026

@vgonkivs vgonkivs self-assigned this Mar 25, 2026
@vgonkivs vgonkivs requested a review from a team as a code owner March 25, 2026 16:59
@vgonkivs vgonkivs requested a review from walldiss March 25, 2026 16:59
devin-ai-integration[bot]

This comment was marked as resolved.

@vgonkivs vgonkivs force-pushed the add_fibre_module branch 3 times, most recently from 53ba2fa to cbe0341 Compare March 25, 2026 17:18
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 18 additional findings in Devin Review.

Open in Devin Review

Comment on lines +230 to +237
PreRunE: func(_ *cobra.Command, args []string) error {
if !strings.HasPrefix(args[0], "0x") {
args[0] = "0x" + args[0]
}
if !strings.HasPrefix(args[1], "0x") {
args[1] = "0x" + args[1]
}
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 submitFibreCmd PreRunE corrupts non-hex blob data by prepending '0x'

The submitFibreCmd.PreRunE unconditionally prepends "0x" to args[1] (blob data) if it doesn't already have that prefix. Later in RunE, cmdnode.DecodeToBytes(args[1]) attempts hex decoding, and on failure falls back to []byte(args[1]). Because PreRunE already mutated the argument, the fallback raw bytes now include the spurious "0x" prefix. For example, a user running submit-fibre <ns> "hello" ends up submitting []byte("0xhello") instead of []byte("hello"). This matches the same pre-existing pattern in submitCmd, but submitFibreCmd is entirely new code introduced in this PR.

Prompt for agents
In nodebuilder/blob/cmd/blob.go, the submitFibreCmd.PreRunE (lines 230-237) should NOT prepend '0x' to args[1] (the blob data). The '0x' prefix should only be added for args[0] (the namespace), since blob data might be plain text. Remove the lines 233-235 that add '0x' to args[1]. The same bug also exists in the regular submitCmd's PreRunE and should be fixed there too for consistency.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two competing fibre blob submission paths, and one of them is incomplete

  • modfibre.Upload

The newly introduced API, however, is incomplete as it doesn't send the PFF transactions. The comment mentions that users should use modblob.SubmitFibreBlob to send TXs; however, doing so requires reuploading the entire blob over the API, and modblob.SubmitFibreBlob reuploads it again.

Essentially, there is no way for fibre users of Upload to pay for their uploads

  • modblob.SubmitFibreBlob

The purpose of this method is unclear. It duplicates the entire fibre blob submission on the old blob service, however, the intention was clearly to introduce a new API as seen in the new fibre module, so why duplicate? If the intention was to allow sending PFFs through this module, then it is unclear why blob module would be responsible for that and not fibre.

Too many layers of indirections

With this PR we have:

  • modfibre.Module
  • fibre.Client
  • fibre.client
  • txclient fibre methods
  • appfibre.Client

This is extremely messy and a pure pain to review. We can easily squash a bunch of them with no repercussions into:

  • modfibre.Module
  • fibre.Service
  • appfibre.Client.

Besides, for whatever reason, out of all those layers, the txclient turned out to be responsible for actually uploading. TxClient, Carl! It is supposed to send transactions and not call appfibre.Client.Upload.

Subscriptions

There is no way to subscribe for blobs from fibre in the order they'be landed through consensus.

// It encodes the blob, constructs a payment promise, uploads encoded rows to FSPs,
// and aggregates validator availability signatures. It does NOT submit MsgPayForFibre on-chain.
// Use blob.SubmitFibreBlob for the full submit flow.
Upload(ctx context.Context, ns libshare.Namespace, data []byte) (*fibre.UploadResult, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface definition should be together with types it uses for modules in case those types are API level heler types(Result/Response), instead of internal types(Blob)

// It encodes the blob, constructs a payment promise, uploads encoded rows to FSPs,
// and aggregates validator availability signatures. It does NOT submit MsgPayForFibre on-chain.
// Use blob.SubmitFibreBlob for the full submit flow.
Upload(ctx context.Context, ns libshare.Namespace, data []byte) (*fibre.UploadResult, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is UploadResult, but GetBlobResponse? Consistencty?

Comment on lines +74 to +88
func (m *blobMetrics) observeUpload(ctx context.Context, dur time.Duration, blobSize int, err error) {
if m == nil {
return
}
m.uploadDuration.Record(ctx, dur.Seconds(), blobAttrs(blobSize, err))
}

func (m *blobMetrics) observeSubmit(ctx context.Context, dur time.Duration, blobSize int, err error) {
if m == nil {
return
}
m.submitDuration.Record(ctx, dur.Seconds(), blobAttrs(blobSize, err))
}

func (m *blobMetrics) observeGet(ctx context.Context, dur time.Duration, err error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need those metrics if they are already part of the appfibre.Client, besides the Submit, but that's just tx submission metric?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics track different layers. fibre/metrics.go measures end-to-end node-level latency (upload, submit, get), txclient measures tx submission and gas estimation, and appfibre.Client tracks FSP networking internals. They complement each other — knowing that a submit took 5s total but only 200ms on tx submission tells you the bottleneck is in the upload phase.

Comment on lines +104 to +111
cl, err := appfibre.NewClient(c.keyring, appfibre.DefaultClientConfig())
if err != nil {
return fmt.Errorf("failed to setup fibre client: %w", err)
}
if err := cl.Start(c.ctx); err != nil {
return fmt.Errorf("failed to start fibre client: %w", err)
}
c.fibreClient = cl
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide fibre client as a component to DI with full lifecycling. Its Start method is blocking and is meant to called while node starts and break node start if there is misconfiguration, rather then being called in runtime

@vgonkivs
Copy link
Copy Markdown
Member Author

vgonkivs commented Apr 2, 2026

This PR was done in accordance to the ADR, made by @cmwaters.

There are two competing fibre blob submission paths, and one of them is incomplete
modfibre.Upload
The newly introduced API, however, is incomplete as it doesn't send the PFF transactions. The comment mentions that users should use modblob.SubmitFibreBlob to send TXs; however, doing so requires reuploading the entire blob over the API, and modblob.SubmitFibreBlob reuploads it again.

Link1

Link2

Subscriptions
There is no way to subscribe for blobs from fibre in the order they'be landed through consensus.

Auto-fetching full fibre data from FSPs needs design work first. Feel free to open an ADR update with a proposal

@Wondertan
Copy link
Copy Markdown
Member

@vgonkivs, I acknowledge that the PR implements the ADR. However, it does not change the fact that users can't pay for the uploads they make and we should fix this.

Auto-fetching full fibre data from FSPs needs design work first. Feel free to open an ADR update with a proposal

The subscription does not imply listening for data from FSPs, but listening to new fibre-blobs in the square and fetching respective fibre blobs by commitment.

@vgonkivs
Copy link
Copy Markdown
Member Author

vgonkivs commented Apr 2, 2026

Feel free to open an ADR update with a proposal

@vgonkivs
Copy link
Copy Markdown
Member Author

vgonkivs commented Apr 2, 2026

However, it does not change the fact that users can't pay for the uploads they make and we should fix this.

https://github.com/celestiaorg/celestia-node/pull/4892/changes#diff-575205cc93599bc2a9d28e62e576697e9fcf39733970a61aeebfade20493f1dbR50

@Wondertan
Copy link
Copy Markdown
Member

@Wondertan
Copy link
Copy Markdown
Member

Feel free to open an ADR update with a proposal

Nothing is stopping us from modifying the ADR in this PR. That's a normal feedback process where during implementation issues are discovered and spec/adr is updated accordingly.

@vgonkivs vgonkivs requested a review from Wondertan April 3, 2026 14:57
@vgonkivs
Copy link
Copy Markdown
Member Author

vgonkivs commented Apr 3, 2026

There is no final decision yet on the upload, so no changes were made to this part.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 13 additional findings in Devin Review.

Open in Devin Review

Comment on lines +219 to +227
allFibre := slices.IndexFunc(blobs, func(b *Blob) bool { return !b.IsFibreBlob() }) == -1
anyFibre := slices.IndexFunc(blobs, func(b *Blob) bool { return b.IsFibreBlob() }) != -1

if anyFibre && !allFibre {
return 0, ErrMixedBlobTypes
}

if allFibre {
return s.submitFibreBlobs(ctx, blobs, txConfig)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Empty blobs slice is silently routed through fibre path, returning success with height 0

When Submit is called with an empty blobs slice, slices.IndexFunc returns -1 for both lambda predicates, making allFibre = (-1 == -1) = true and anyFibre = (-1 != -1) = false. This causes the empty submission to enter submitFibreBlobs, which iterates zero times and returns (0, nil) — a silent success at height 0. Previously, the code would pass the empty slice directly to blobSubmitter.SubmitPayForBlob, which would handle it through the regular consensus path (likely returning an error). The new behavior silently succeeds with a meaningless height, which could confuse callers.

Suggested change
allFibre := slices.IndexFunc(blobs, func(b *Blob) bool { return !b.IsFibreBlob() }) == -1
anyFibre := slices.IndexFunc(blobs, func(b *Blob) bool { return b.IsFibreBlob() }) != -1
if anyFibre && !allFibre {
return 0, ErrMixedBlobTypes
}
if allFibre {
return s.submitFibreBlobs(ctx, blobs, txConfig)
allFibre := len(blobs) > 0 && slices.IndexFunc(blobs, func(b *Blob) bool { return !b.IsFibreBlob() }) == -1
anyFibre := slices.IndexFunc(blobs, func(b *Blob) bool { return b.IsFibreBlob() }) != -1
if anyFibre && !allFibre {
return 0, ErrMixedBlobTypes
}
if allFibre {
return s.submitFibreBlobs(ctx, blobs, txConfig)
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +241 to +255
func (s *Service) submitFibreBlobs(ctx context.Context, blobs []*Blob, txConfig *SubmitOptions) (uint64, error) {
if s.fibreSubmitter == nil {
return 0, fmt.Errorf("fibre submitter is not available")
}

var height uint64
for _, blob := range blobs {
res, _, err := s.fibreSubmitter.Submit(ctx, blob.Namespace(), blob.Data(), txConfig)
if err != nil {
return 0, err
}
height = res.Height
}
return height, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Non-atomic fibre blob submission loses track of successfully submitted blobs on partial failure

submitFibreBlobs submits each fibre blob individually in a loop via s.fibreSubmitter.Submit, which performs a full on-chain MsgPayForFibre transaction per blob (fibre/service.go:83). If the first blob succeeds (is permanently committed on-chain) but the second blob fails, the function returns (0, err), discarding the height of the successfully submitted first blob. The caller receives an error and has no way to discover that one blob was already included on-chain. This violates the "atomically" contract documented on Submit and can lead to lost blob tracking.

Loop that discards partial results on failure
var height uint64
for _, blob := range blobs {
    res, _, err := s.fibreSubmitter.Submit(ctx, blob.Namespace(), blob.Data(), txConfig)
    if err != nil {
        return 0, err  // discards previously successful submissions
    }
    height = res.Height
}
return height, nil
Prompt for agents
The submitFibreBlobs function in blob/service.go submits fibre blobs one-by-one in a loop. Each call to s.fibreSubmitter.Submit performs a full on-chain MsgPayForFibre transaction (see fibre/service.go Submit method). If a blob in the middle of the loop fails, previously submitted blobs are already on-chain and cannot be rolled back, but the function returns (0, err) discarding any successfully submitted heights.

The Submit method's doc comment promises atomic submission, but fibre blobs are inherently submitted individually. There are several approaches to fix this:

1. Return partial results: change the return type or use a structured error that includes the heights of successfully submitted blobs.
2. Validate that only a single fibre blob is allowed per Submit call, since atomicity cannot be guaranteed for multiple fibre blobs.
3. Update the documentation to explicitly state that fibre blob submission is not atomic when multiple blobs are provided.
4. Collect all results and return the last successful height even on error, so the caller at least knows some blobs landed.

Option 2 is the safest approach since it maintains the atomicity guarantee by restricting the input.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants