Feat/arweave id offset indexing #616

JamesPiechota · 2026-01-19T18:25:29Z

Add support for indexing all transactions and bundled ans104 data items in a block. Index maps the tx or item ID to an offset in the weave. When loading the tx or item, hb_store_arweave will query the range of weave data from the configured chunk node and deserialize it.

New options:

arweave_index_ids: when true dev_copycat_arweave will index the transactions and ans104 items in a block
arweave_index_store: configure the store to use for maintaining the index
routes => #{ <<"template">> => <<"/chunk">> }: configure the the gateway to use for GET /chunk requests.

Index format:

<<"ID">> -> <<"IsTX:Offset:Length">>
The boolean "IsTX" is neededto indicate whether the indexed item is an L1 TX or an L2 DataItem. Reason for the distinction is we need to query the TX header to get the tags for an L1 TX, but that's not needed for an L2 DataItem.

Questions/Notes

~copycat@1.0: I updated how it iterates through the range of blocks to be indexed. Let me know I should revert.
- Old behavior: Count was exclusive and would keep going if from was less than to. e.g. from=1000001&to=1000000 will index only block 1000001, from=999999&to=1000000 will index all blocks 999999 and lower.
- New behavior: Count is inclusive and stops when from is less than to. e.g. from=1000001&to=1000000will index blocks1000001and1000000, from=999999&to=1000000` will index no blocks.
I'm still not sure on when to use hb_ao:resolve vs. hb_ao:get. This PR primarily uses hb_ao:resolve and only uses hb_ao:get when querying a key from a map.
When should opt keys be atoms (e.g. arweave_index_store) vs. binaries (e.g. <<"arweave-index-store">>)? I tried to mimic the conventions already in use.
I added a <<"exclude-data">> arg to dev_arweave to allow it to query only the TX header without also downloading the data. I had initially omitted the flag and just forced the data download to be a separate operation, but this created some complexity around the overlap between L2 and L1 IDs. An L2 ID always maps to the full data item, but an L2 ID would only map to the TX header and then the client would have to do a second resolve to get the data payload. Current approach keeps legacy behavior the same (both L2 and L1 IDs map to the full payload), with the option of only querying the TX header where needed.
In order to validate an L1 TX we need to recompute the data_root. This computation depends on how the serialized data was "chunked". Unfortunately this information is not currently preserved in HB messages. The majority of transactions likely follow the arweave-js chunking scheme. This PR implements that chunking scheme as the default. In the future we may need to either track chunk boundaries (e.g. as commitment fields), or support multiple chunking schemes (and track those as commitment fields)
When dev_arweave queries the gateway's /chunk endpoint it assumes the gateway is running a recent commit from the arweave repo (4de096e20028df01f61002620bd7d39297064a5b). This commit has not yet (as of Jan 25, 2026) been included in any formal arweave releases.
There are still some types of data items that are not supported in HB (e.g. any dataitem that is not signed with RSA). Those items will be indexed, but HB will fail when it tries to read and deserialize them. This is an existing limitation not addressed by this PR, but just calling it out.
The block indexing logic will currently not recurse into nested bundles. It will index the top-level L1 bundle, and then all data items within that bundle, but it won't recurse further.

Incorporated official announcement from Jan 22, 2026: Completed Milestones: - ✅ M1: AO Core - ✅ M2: Native Execution & TEE Support - ✅ M3: LegacyNet Migration (100x performance gains) M4 Official Features: - Decentralized Schedulers - LiveNet Staking Marketplace - Streaming Token Distributions Added comprehensive branch-to-PR mapping: - 57 open PRs with owners and status - 70+ merged PRs since release - Branch ownership for all active development Key contributors working on M4: - samcamwilliams: Core protocol, native tokens (expr/1.5, feat/native-tokens) - speeddragon: Cryptography, fixes (feat/ecdsa_support, PR permaweb#574) - JamesPiechota: Indexing (feat/arweave-id-offset-indexing, PR permaweb#616) - noahlevenson: Security testing (impr/secure-actions) - PeterFarber: TEE attestation (feat/c_snp)

… (i.e. true TX headers) Specify exclude-data=1 to exclude the data

…ests neo-arweave has a roundrobin scheme where it will try several nodes looking for a chunk. arweave.net delegates to a single node regardless of whether or not it has the chunk - this can yield unreliable results (same query sometimes returns data sometimes 404s)

…point

JamesPiechota · 2026-01-26T21:04:04Z

src/dev_copycat_arweave.erl

+    %% it).
+    TestStore = hb_test_utils:test_store(),
+    StoreOpts = #{ <<"index-store">> => [TestStore] },
+    Store = [


Is there a better way to have a test use a test store for all stores? If I don't do this, the test will use the default (mainnet) store some of the time and the test store other times which breaks the test.

JamesPiechota · 2026-01-26T21:04:47Z

src/hb_opts.erl

+                <<"node">> =>
+                    #{
+                        <<"match">> => <<"^/arweave">>,
+                        <<"with">> => <<"https://neo-arweave.zephyrdev.xyz">>,


Route GET /chunk to neo-arweave for now as it is more reliable for this specific endpoint.

JamesPiechota · 2026-01-26T21:05:45Z

src/hb_store_arweave.erl

+            % TODO:
+            % - should this return composite for any index L1 bundles?
+            % - if so, I guess we need to implement list/2?
+            % - for now we don't index nested bundle children, but once we
+            %   do we may nalso need to return composite for them.


Calling this TODO out. Not sure if some of this must be addressed before we merge or whether it can all wait for a future PR?

Main change was implementing hb_store_arweave:type/2

…indexed - Old behavior: Count was exclusive and would keep going if `from` was less than `to`. e.g. `from=1000001&to=1000000` will index only block `1000001`, `from=999999&to=1000000` will index all blocks 999999 and lower. - New behavior: Count is inclusive and stops when `from` is less than `to`. e.g. from=1000001&to=1000000` will index blocks `1000001` and `1000000`, `from=999999&to=1000000` will index no blocks.

JamesPiechota · 2026-01-26T21:43:26Z

src/dev_copycat_arweave.erl

+fetch_blocks(Req, Current, To, _Opts) when Current < To ->
    ?event(copycat_arweave,
        {arweave_block_indexing_completed,
-            {reached_target, Current},
+            {reached_target, To},
            {initial_request, Req}
        }
    ),
-    {ok, Current};
+    {ok, To};
 fetch_blocks(Req, Current, To, Opts) ->
    BlockRes =
        hb_ao:resolve(
            <<
                ?ARWEAVE_DEVICE/binary,
                "/block=",
                (hb_util:bin(Current))/binary
            >>,
            Opts
        ),
    process_block(BlockRes, Req, Current, To, Opts),
    fetch_blocks(Req, Current - 1, To, Opts).



Old behavior: Count was exclusive and would keep going if from was less than to. e.g. from=1000001&to=1000000 will index only block 1000001, from=999999&to=1000000 will index all blocks 999999 and lower.

New behavior: Count is inclusive and stops when from is less than to. e.g. from=1000001&to=1000000 will index blocks 1000001, and 1000000, from=999999&to=1000000 will index no blocks.

hb_opts arwaeave_index_retries to control number of retries. 0 disables retry

samcamwilliams and others added 11 commits January 24, 2026 21:54

wip: Arweave TXID->offset indexing in copycat@1.0

0968914

fix: only treat path segments as ids if they can't further segmented

69fe4cf

wip: implement ~arweave@2.9-pre/chunk

f6532ed

test: make dev_arweave tests more reliable

d767535

wip: support L1 TX messages that have data_size/data_root but no data…

bb45022

… (i.e. true TX headers) Specify exclude-data=1 to exclude the data

wip: write and read TX-bundle to hb_store_arweave

7d6137b

wip: working on getting a single data item from a bundle to load

3438a06

fix: chunk up L1 tx data according to the arwave-js logic

512fd16

fix: use legacy chunking mode for format=1 transactions

c629125

fix: add full block index test to dev_copycat_arweave

bfce13b

JamesPiechota force-pushed the feat/arweave-id-offset-indexing branch from c1a32eb to bfce13b Compare January 25, 2026 02:55

test: update dev_copycat_arweave test to use the regular device entry…

6fb2ce0

…point

JamesPiechota marked this pull request as ready for review January 26, 2026 02:21

JamesPiechota commented Jan 26, 2026

View reviewed changes

fix: enable hb_ao:resolve to work for indexed arweave items

73b4fad

Main change was implementing hb_store_arweave:type/2

JamesPiechota force-pushed the feat/arweave-id-offset-indexing branch from 353a1eb to 73b4fad Compare January 26, 2026 21:08

JamesPiechota added 2 commits January 26, 2026 16:21

fix: hb_store_arweave now uses start offset instead of end offset

f153d81

JamesPiechota commented Jan 26, 2026

View reviewed changes

JamesPiechota added 4 commits January 26, 2026 17:50

fix: support bundles with large headers

a2ed35d

fix: skip L1 TXs that are signed with ECDSA

13a2eb3

impr: log count of items indexed in each block

0c71ec6

fix: allow dev_copycat_arweave to retry failed requests

679498e

hb_opts arwaeave_index_retries to control number of retries. 0 disables retry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/arweave id offset indexing #616

Feat/arweave id offset indexing #616

JamesPiechota commented Jan 19, 2026 •

edited

Loading

Uh oh!

JamesPiechota Jan 26, 2026

Uh oh!

JamesPiechota Jan 26, 2026

Uh oh!

JamesPiechota Jan 26, 2026

Uh oh!

JamesPiechota Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat/arweave id offset indexing #616

Are you sure you want to change the base?

Feat/arweave id offset indexing #616

Conversation

JamesPiechota commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesPiechota Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JamesPiechota commented Jan 19, 2026 •

edited

Loading

JamesPiechota Jan 26, 2026 •

edited

Loading