Skip to content

Conversation

@JamesPiechota
Copy link
Collaborator

@JamesPiechota JamesPiechota commented Jan 19, 2026

Add support for indexing all transactions and bundled ans104 data items in a block. Index maps the tx or item ID to an offset in the weave. When loading the tx or item, hb_store_arweave will query the range of weave data from the configured chunk node and deserialize it.

New options:

  • arweave_index_ids: when true dev_copycat_arweave will index the transactions and ans104 items in a block
  • arweave_index_store: configure the store to use for maintaining the index
  • routes => #{ <<"template">> => <<"/chunk">> }: configure the the gateway to use for GET /chunk requests.

Index format:

  • <<"ID">> -> <<"IsTX:Offset:Length">>
  • The boolean "IsTX" is neededto indicate whether the indexed item is an L1 TX or an L2 DataItem. Reason for the distinction is we need to query the TX header to get the tags for an L1 TX, but that's not needed for an L2 DataItem.

Questions/Notes

  • ~copycat@1.0: I updated how it iterates through the range of blocks to be indexed. Let me know I should revert.
    • Old behavior: Count was exclusive and would keep going if from was less than to. e.g. from=1000001&to=1000000 will index only block 1000001, from=999999&to=1000000 will index all blocks 999999 and lower.
    • New behavior: Count is inclusive and stops when from is less than to. e.g. from=1000001&to=1000000will index blocks1000001and1000000, from=999999&to=1000000` will index no blocks.
  • I'm still not sure on when to use hb_ao:resolve vs. hb_ao:get. This PR primarily uses hb_ao:resolve and only uses hb_ao:get when querying a key from a map.
  • When should opt keys be atoms (e.g. arweave_index_store) vs. binaries (e.g. <<"arweave-index-store">>)? I tried to mimic the conventions already in use.
  • I added a <<"exclude-data">> arg to dev_arweave to allow it to query only the TX header without also downloading the data. I had initially omitted the flag and just forced the data download to be a separate operation, but this created some complexity around the overlap between L2 and L1 IDs. An L2 ID always maps to the full data item, but an L2 ID would only map to the TX header and then the client would have to do a second resolve to get the data payload. Current approach keeps legacy behavior the same (both L2 and L1 IDs map to the full payload), with the option of only querying the TX header where needed.
  • In order to validate an L1 TX we need to recompute the data_root. This computation depends on how the serialized data was "chunked". Unfortunately this information is not currently preserved in HB messages. The majority of transactions likely follow the arweave-js chunking scheme. This PR implements that chunking scheme as the default. In the future we may need to either track chunk boundaries (e.g. as commitment fields), or support multiple chunking schemes (and track those as commitment fields)
  • When dev_arweave queries the gateway's /chunk endpoint it assumes the gateway is running a recent commit from the arweave repo (4de096e20028df01f61002620bd7d39297064a5b). This commit has not yet (as of Jan 25, 2026) been included in any formal arweave releases.
  • There are still some types of data items that are not supported in HB (e.g. any dataitem that is not signed with RSA). Those items will be indexed, but HB will fail when it tries to read and deserialize them. This is an existing limitation not addressed by this PR, but just calling it out.
  • The block indexing logic will currently not recurse into nested bundles. It will index the top-level L1 bundle, and then all data items within that bundle, but it won't recurse further.

ocrybit pushed a commit to ocrybit/HyperBEAM that referenced this pull request Jan 24, 2026
Incorporated official announcement from Jan 22, 2026:

Completed Milestones:
- ✅ M1: AO Core
- ✅ M2: Native Execution & TEE Support
- ✅ M3: LegacyNet Migration (100x performance gains)

M4 Official Features:
- Decentralized Schedulers
- LiveNet Staking Marketplace
- Streaming Token Distributions

Added comprehensive branch-to-PR mapping:
- 57 open PRs with owners and status
- 70+ merged PRs since release
- Branch ownership for all active development

Key contributors working on M4:
- samcamwilliams: Core protocol, native tokens (expr/1.5, feat/native-tokens)
- speeddragon: Cryptography, fixes (feat/ecdsa_support, PR permaweb#574)
- JamesPiechota: Indexing (feat/arweave-id-offset-indexing, PR permaweb#616)
- noahlevenson: Security testing (impr/secure-actions)
- PeterFarber: TEE attestation (feat/c_snp)
@JamesPiechota JamesPiechota force-pushed the feat/arweave-id-offset-indexing branch from c1a32eb to bfce13b Compare January 25, 2026 02:55
@JamesPiechota JamesPiechota marked this pull request as ready for review January 26, 2026 02:21
%% it).
TestStore = hb_test_utils:test_store(),
StoreOpts = #{ <<"index-store">> => [TestStore] },
Store = [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to have a test use a test store for all stores? If I don't do this, the test will use the default (mainnet) store some of the time and the test store other times which breaks the test.

<<"node">> =>
#{
<<"match">> => <<"^/arweave">>,
<<"with">> => <<"https://neo-arweave.zephyrdev.xyz">>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Route GET /chunk to neo-arweave for now as it is more reliable for this specific endpoint.

Comment on lines +20 to +24
% TODO:
% - should this return composite for any index L1 bundles?
% - if so, I guess we need to implement list/2?
% - for now we don't index nested bundle children, but once we
% do we may nalso need to return composite for them.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this TODO out. Not sure if some of this must be addressed before we merge or whether it can all wait for a future PR?

Main change was implementing hb_store_arweave:type/2
@JamesPiechota JamesPiechota force-pushed the feat/arweave-id-offset-indexing branch from 353a1eb to 73b4fad Compare January 26, 2026 21:08
…indexed

- Old behavior: Count was exclusive and would keep going if `from` was less than `to`. e.g. `from=1000001&to=1000000` will index only block `1000001`, `from=999999&to=1000000` will index all blocks 999999 and lower.
- New behavior: Count is inclusive and stops when `from` is less than `to`. e.g. from=1000001&to=1000000` will index blocks `1000001` and `1000000`, `from=999999&to=1000000` will index no blocks.
Comment on lines 40 to 60
fetch_blocks(Req, Current, To, _Opts) when Current < To ->
?event(copycat_arweave,
{arweave_block_indexing_completed,
{reached_target, Current},
{reached_target, To},
{initial_request, Req}
}
),
{ok, Current};
{ok, To};
fetch_blocks(Req, Current, To, Opts) ->
BlockRes =
hb_ao:resolve(
<<
?ARWEAVE_DEVICE/binary,
"/block=",
(hb_util:bin(Current))/binary
>>,
Opts
),
process_block(BlockRes, Req, Current, To, Opts),
fetch_blocks(Req, Current - 1, To, Opts).

Copy link
Collaborator Author

@JamesPiechota JamesPiechota Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Old behavior: Count was exclusive and would keep going if from was less than to. e.g. from=1000001&to=1000000 will index only block 1000001, from=999999&to=1000000 will index all blocks 999999 and lower.
  • New behavior: Count is inclusive and stops when from is less than to. e.g. from=1000001&to=1000000 will index blocks 1000001, and 1000000, from=999999&to=1000000 will index no blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants