-
Notifications
You must be signed in to change notification settings - Fork 74
Feat/arweave id offset indexing #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: edge
Are you sure you want to change the base?
Conversation
Incorporated official announcement from Jan 22, 2026: Completed Milestones: - ✅ M1: AO Core - ✅ M2: Native Execution & TEE Support - ✅ M3: LegacyNet Migration (100x performance gains) M4 Official Features: - Decentralized Schedulers - LiveNet Staking Marketplace - Streaming Token Distributions Added comprehensive branch-to-PR mapping: - 57 open PRs with owners and status - 70+ merged PRs since release - Branch ownership for all active development Key contributors working on M4: - samcamwilliams: Core protocol, native tokens (expr/1.5, feat/native-tokens) - speeddragon: Cryptography, fixes (feat/ecdsa_support, PR permaweb#574) - JamesPiechota: Indexing (feat/arweave-id-offset-indexing, PR permaweb#616) - noahlevenson: Security testing (impr/secure-actions) - PeterFarber: TEE attestation (feat/c_snp)
… (i.e. true TX headers) Specify exclude-data=1 to exclude the data
…ests neo-arweave has a roundrobin scheme where it will try several nodes looking for a chunk. arweave.net delegates to a single node regardless of whether or not it has the chunk - this can yield unreliable results (same query sometimes returns data sometimes 404s)
c1a32eb to
bfce13b
Compare
src/dev_copycat_arweave.erl
Outdated
| %% it). | ||
| TestStore = hb_test_utils:test_store(), | ||
| StoreOpts = #{ <<"index-store">> => [TestStore] }, | ||
| Store = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to have a test use a test store for all stores? If I don't do this, the test will use the default (mainnet) store some of the time and the test store other times which breaks the test.
| <<"node">> => | ||
| #{ | ||
| <<"match">> => <<"^/arweave">>, | ||
| <<"with">> => <<"https://neo-arweave.zephyrdev.xyz">>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Route GET /chunk to neo-arweave for now as it is more reliable for this specific endpoint.
| % TODO: | ||
| % - should this return composite for any index L1 bundles? | ||
| % - if so, I guess we need to implement list/2? | ||
| % - for now we don't index nested bundle children, but once we | ||
| % do we may nalso need to return composite for them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this TODO out. Not sure if some of this must be addressed before we merge or whether it can all wait for a future PR?
Main change was implementing hb_store_arweave:type/2
353a1eb to
73b4fad
Compare
…indexed - Old behavior: Count was exclusive and would keep going if `from` was less than `to`. e.g. `from=1000001&to=1000000` will index only block `1000001`, `from=999999&to=1000000` will index all blocks 999999 and lower. - New behavior: Count is inclusive and stops when `from` is less than `to`. e.g. from=1000001&to=1000000` will index blocks `1000001` and `1000000`, `from=999999&to=1000000` will index no blocks.
| fetch_blocks(Req, Current, To, _Opts) when Current < To -> | ||
| ?event(copycat_arweave, | ||
| {arweave_block_indexing_completed, | ||
| {reached_target, Current}, | ||
| {reached_target, To}, | ||
| {initial_request, Req} | ||
| } | ||
| ), | ||
| {ok, Current}; | ||
| {ok, To}; | ||
| fetch_blocks(Req, Current, To, Opts) -> | ||
| BlockRes = | ||
| hb_ao:resolve( | ||
| << | ||
| ?ARWEAVE_DEVICE/binary, | ||
| "/block=", | ||
| (hb_util:bin(Current))/binary | ||
| >>, | ||
| Opts | ||
| ), | ||
| process_block(BlockRes, Req, Current, To, Opts), | ||
| fetch_blocks(Req, Current - 1, To, Opts). | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Old behavior: Count was exclusive and would keep going if from was less than to. e.g.
from=1000001&to=1000000will index only block1000001,from=999999&to=1000000will index all blocks999999and lower. - New behavior: Count is inclusive and stops when from is less than to. e.g.
from=1000001&to=1000000will index blocks1000001, and1000000,from=999999&to=1000000will index no blocks.
hb_opts arwaeave_index_retries to control number of retries. 0 disables retry
Add support for indexing all transactions and bundled ans104 data items in a block. Index maps the tx or item ID to an offset in the weave. When loading the tx or item,
hb_store_arweavewill query the range of weave data from the configured chunk node and deserialize it.New options:
arweave_index_ids: whentruedev_copycat_arweavewill index the transactions and ans104 items in a blockarweave_index_store: configure the store to use for maintaining the indexroutes => #{ <<"template">> => <<"/chunk">> }: configure the the gateway to use forGET /chunkrequests.Index format:
<<"ID">> -> <<"IsTX:Offset:Length">>Questions/Notes
~copycat@1.0: I updated how it iterates through the range of blocks to be indexed. Let me know I should revert.fromwas less thanto. e.g.from=1000001&to=1000000will index only block1000001,from=999999&to=1000000will index all blocks 999999 and lower.fromis less thanto. e.g. from=1000001&to=1000000will index blocks1000001and1000000,from=999999&to=1000000` will index no blocks.hb_ao:resolvevs.hb_ao:get. This PR primarily useshb_ao:resolveand only useshb_ao:getwhen querying a key from a map.arweave_index_store) vs. binaries (e.g.<<"arweave-index-store">>)? I tried to mimic the conventions already in use.<<"exclude-data">>arg todev_arweaveto allow it to query only the TX header without also downloading the data. I had initially omitted the flag and just forced the data download to be a separate operation, but this created some complexity around the overlap between L2 and L1 IDs. An L2 ID always maps to the full data item, but an L2 ID would only map to the TX header and then the client would have to do a secondresolveto get the data payload. Current approach keeps legacy behavior the same (both L2 and L1 IDs map to the full payload), with the option of only querying the TX header where needed.data_root. This computation depends on how the serializeddatawas "chunked". Unfortunately this information is not currently preserved in HB messages. The majority of transactions likely follow the arweave-js chunking scheme. This PR implements that chunking scheme as the default. In the future we may need to either track chunk boundaries (e.g. as commitment fields), or support multiple chunking schemes (and track those as commitment fields)dev_arweavequeries the gateway's/chunkendpoint it assumes the gateway is running a recent commit from thearweaverepo (4de096e20028df01f61002620bd7d39297064a5b). This commit has not yet (as of Jan 25, 2026) been included in any formal arweave releases.