Skip to content

S3 streaming has a bug #11

@maoueh

Description

@maoueh

It appears our streaming code for S3 (or maybe in the S3 library itself) has a bug that leads to weird file issue where the content is not read fully.

The bug seems to happen non-systematically, which makes me think it could be a wrong "error" handling when the stream closes unexpectedly.

This problem has been reported a few times over the past 1-2 years, against Ceph, S3 directly and SeaweedFS. The current workaround is to set DSTORE_S3_BUFFERED_READ=true which reads everything in one show in memory and then act as a io.Reader. This however creates memory pressure as the full file is held in memory before being streamed.

See streamingfast/firehose-core#15 for some details, and some logs from SeaweedFS. We can see there that SeaweedFS sees internal failure but those leads later to Firehose trying to read corrupted blocks:

panic: unable to decode block #16651067 (fb80c53f0b9ad8a026d21cf9aab801e42ea6db209de86053fead6a751f8f6477) payload (kind: ETH, version: 3, size: 1047315, sha256: 0614c58482dfdd1ebfd10abda656531bd8b81e15852dc54138ad8e0f592e9f3c): unable to decode payload: proto: cannot parse invalid wire-format data

Payload: [OMITTING HUGE LINE OF BINARY DATA]

goroutine 345531 [running]:
github.com/streamingfast/bstream.(*Block).ToProtocol(0xc02376c500)
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/block.go:246 +0x6e9
github.com/streamingfast/firehose-ethereum/transform.(*CombinedFilter).Transform(0xc0003b05f8?, 0x681d52?, {0x64c0f1?, 0x3476c60?})
	/home/runner/work/firehose-ethereum/firehose-ethereum/transform/combined_filter.go:185 +0x36
github.com/streamingfast/bstream/transform.(*Registry).BuildFromTransforms.func1(0xc01fe9be00)
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/transform/builder.go:82 +0x1d3
github.com/streamingfast/bstream.(*FileSource).preprocess(0xc001bcf5c0, 0xc01fe9be00, 0xc0404bb860)
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/filesource.go:506 +0x5b
created by github.com/streamingfast/bstream.(*FileSource).streamReader
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/filesource.go:495 +0x68a

Which means someone the "consumer" saw a end of the stream but the actual reading code failed due to some missing bytes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions