Skip to content

Commit bb9b5c8

Browse files
committed
feat(spec): add spec for RequiredBlockState
1 parent 5a9bccb commit bb9b5c8

File tree

1 file changed

+343
-0
lines changed

1 file changed

+343
-0
lines changed

specs/required_block_state.md

Lines changed: 343 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,343 @@
1+
## `RequiredBlockState` specification
2+
3+
Specification of a data format that contains state required to
4+
trace a single Ethereum block.
5+
6+
This is the format of the data returned by the `eth_getRequiredBlockState` JSON-RPC method.
7+
8+
## Table of Contents
9+
10+
- [`RequiredBlockState` specification](#requiredblockstate-specification)
11+
- [Table of Contents](#table-of-contents)
12+
- [Abstract](#abstract)
13+
- [Motivation](#motivation)
14+
- [Overview](#overview)
15+
- [General Structure](#general-structure)
16+
- [Notation](#notation)
17+
- [Endianness](#endianness)
18+
- [Constants](#constants)
19+
- [Variable-size type parameters](#variable-size-type-parameters)
20+
- [Definitions](#definitions)
21+
- [`RequiredBlockState`](#requiredblockstate)
22+
- [`CompactEip1186Proof`](#compacteip1186proof)
23+
- [`Contract`](#contract)
24+
- [`TrieNode`](#trienode)
25+
- [`RecentBlockHash`](#recentblockhash)
26+
- [`CompactStorageProof`](#compactstorageproof)
27+
- [Algorithms](#algorithms)
28+
- [`construct_required_block_state`](#construct_required_block_state)
29+
- [`get_state_accesses`](#get_state_accesses)
30+
- [`get_proofs`](#get_proofs)
31+
- [`get_block_hashes`](#get_block_hashes)
32+
- [`use_required_block_state`](#use_required_block_state)
33+
- [`verify_required_block_state`](#verify_required_block_state)
34+
- [`trace_block_locally`](#trace_block_locally)
35+
- [`compression_procedure`](#compression_procedure)
36+
- [Security](#security)
37+
- [Future protocol changes](#future-protocol-changes)
38+
- [Canonicality](#canonicality)
39+
- [Post-block state root](#post-block-state-root)
40+
41+
42+
## Abstract
43+
44+
An Ethereum block returned by `eth_getBlockByNumber` can be considered a program that executes
45+
a state transition. The input to that program is the state immediately prior to that block.
46+
Only a small part of that state is required to run the program (re-execute the block).
47+
The state values can be accompanied by merkle proofs to prevent tampering.
48+
49+
The specification of that state (values and proofs as `RequiredBlockState`) facilitates
50+
data transfer between two parties. The transfer represents the minimum amount of data
51+
required for the holder of an Ethereum block to re-execute that block.
52+
53+
Re-execution is required for basic accounting (examination of the history of the global
54+
shared ledger). Trustless accounting of single Ethereum blocks allows for lightweight
55+
distributed block exploration.
56+
57+
58+
## Motivation
59+
60+
State is rooted in the header. A merkle multiproof for all state required for all
61+
transactions in one block enables is sufficient to trace any historical block.
62+
63+
In addition to the proof, BLOCKHASH opcode reads are also included.
64+
65+
Together, anyone with an ability to verify that a historical block header is canonical
66+
can trustlessly trace a block without posession of an archive node.
67+
68+
The format of the data is deterministic, so that two peers creating the same
69+
data will produce identical structures.
70+
71+
The primary motivation is that data may be distributed in a peer-to-peer content delivery network.
72+
This would represent the state for a sharded archive node, where users may host subsets of the
73+
data useful to them.
74+
75+
A secondary benefit is that traditional node providers could serve users the ability to
76+
re-execute a block, rather than provide the result of re-execution. Transfer
77+
of `RequiredBlockState` is approximately 167kb/Mgas (~2.5MB per block). Transfer of
78+
a `debug_TraceBlock` result is on the order of hundreds of megabytes per block with memory
79+
disabled, and with memory enabled can be tens of gigabytes. Local re-execution with an EVM
80+
implementation of choice can produce the identical re-execution (including memory or custom
81+
tracers), and can be processed and discarded on the fly.
82+
83+
## Overview
84+
85+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
86+
"RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted
87+
as described in RFC 2119 and RFC 8174.
88+
89+
### General Structure
90+
91+
The `RequiredBlockState` consists of account state values as Merkle proofs, contract bytecode
92+
and recent block hashes.
93+
94+
### Notation
95+
Code snippets appearing in `this style` are to be interpreted as Python 3 psuedocode. The
96+
style of the document is intended to be readable by those familiar with the
97+
Ethereum consensus [https://github.com/ethereum/consensus-specs](https://github.com/ethereum/consensus-specs)
98+
and Simple Serialize (SSZ) ([https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md](https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md))
99+
specifications.
100+
101+
Where a list/vector is said to be sorted, it indicates that the elements are ordered
102+
lexicographically when in hexadecimal representation (e.g., `[0x12, 0x3e, 0xe3]`) prior
103+
to conversion to ssz format. For elements that are containers, the ordering is determined by
104+
the first element in the container.
105+
106+
### Endianness
107+
108+
Big endian form is used as most data relates to the Ethereum execution context.
109+
110+
## Constants
111+
112+
### Variable-size type parameters
113+
114+
Helper values for SSZ operations. SSZ variable-size elements require a maximum length field.
115+
116+
Most values are chosen to be the approximately the smallest possible value.
117+
118+
| Name | Value | Description |
119+
| - | - | - |
120+
| MAX_ACCOUNT_NODES_PER_BLOCK | uint16(32768) | - |
121+
| MAX_BLOCKHASH_READS_PER_BLOCK | uint16(256) | A BLOCKHASH opcode may read up to 256 recent blocks |
122+
| MAX_BYTES_PER_NODE | uint16(32768) | - |
123+
| MAX_BYTES_PER_CONTRACT | uint16(32768) | - |
124+
| MAX_CONTRACTS_PER_BLOCK | uint16(2048) | - |
125+
| MAX_NODES_PER_PROOF | uint16(64) | - |
126+
| MAX_STORAGE_NODES_PER_BLOCK | uint16(32768) | - |
127+
| MAX_ACCOUNT_PROOFS_PER_BLOCK | uint16(8192) | - |
128+
| MAX_STORAGE_PROOFS_PER_ACCOUNT | uint16(8192) | - |
129+
130+
## Definitions
131+
132+
### `RequiredBlockState`
133+
134+
The entire `RequiredBlockState` data format is represented by the following (SSZ-encoded and
135+
snappy-compressed) container.
136+
137+
As proofs sometimes have common internal nodes, all internal nodes for proofs are aggregated
138+
for deduplication. They are located in the `account_nodes` and `storage_nodes` members.
139+
Proofs refer to those nodes by index. A "compact" proof consists of a list of indices, indicating
140+
which node is used.
141+
142+
The proof data represents values in the historical chain immediately prior to the execution of
143+
the block (sometimes referred to as "prestate"). That is, `RequiredBlockState` for block `n`
144+
contains proofs rooted in the state root of block `n - 1`.
145+
146+
```python
147+
class RequiredBlockState(Container):
148+
#sorted (by address)
149+
compact_eip1186_proofs: List[CompactEip1186Proof, MAX_ACCOUNT_PROOFS_PER_BLOCK]
150+
#sorted
151+
contracts: List[Contract, MAX_CONTRACTS_PER_BLOCK]
152+
#sorted
153+
account_nodes: List[TrieNode, MAX_ACCOUNT_NODES_PER_BLOCK]
154+
#sorted
155+
storage_nodes: List[TrieNode, MAX_STORAGE_NODES_PER_BLOCK]
156+
#sorted (by block number)
157+
block_hashes: List[RecentBlockHash, MAX_BLOCKHASH_READS_PER_BLOCK]
158+
```
159+
The `RequiredBlockState` is compressed using snappy encoding (see algorithms section). The
160+
`eth_getRequiredBlockState` JSON-RPC method returns the SSZ-encoded container with snappy encoding.
161+
162+
### `CompactEip1186Proof`
163+
164+
Represents the proof data whose root is the state root in the block header of the preceeding block.
165+
166+
The `account_proof` member consists of indices that refer to items in the `account_nodes` member
167+
of the `RequiredBlockState` container.
168+
169+
```python
170+
class CompactEip1186Proof(Container):
171+
address: Vector[uint8, 20]
172+
balance: List[uint8, 32]
173+
code_hash: Vector[uint8, 32]
174+
nonce: List[uint8, 8]
175+
storage_hash: Vector[uint8, 32]
176+
#sorted: node nearest to root first
177+
account_proof: List[uint16, MAX_NODES_PER_PROOF]
178+
#sorted
179+
storage_proofs: List[CompactStorageProof, MAX_STORAGE_PROOFS_PER_ACCOUNT]
180+
```
181+
182+
### `Contract`
183+
184+
An alias for contract bytecode.
185+
```python
186+
Contract = List[uint8, MAX_BYTES_PER_CONTRACT]
187+
```
188+
189+
### `TrieNode`
190+
191+
An alias for a node in a merkle patricia proof.
192+
193+
Merkle Patricia Trie (MPT) proofs consist of a list of witness nodes that correspond to each trie node that consists of various data elements depending on the type of node (e.g. blank, branch, extension, leaf). When serialized, each witness node is represented as an RLP serialized list of the component elements.
194+
195+
```python
196+
TrieNode = List[uint8, MAX_BYTES_PER_NODE]
197+
```
198+
199+
### `RecentBlockHash`
200+
201+
A block hash accessed by the "BLOCKHASH" opcode.
202+
```python
203+
class RecentBlockHash(Container):
204+
block_number: List[uint8, 8]
205+
block_hash: Vector[uint8, 32]
206+
```
207+
208+
### `CompactStorageProof`
209+
210+
The `proof` member consists of indices that refer to items in the `storage_nodes` member
211+
of the `RequiredBlockState` container.
212+
213+
The proof consists of a list of indices, one per node. The indices refer to the nodes in `TrieNode`.
214+
```python
215+
class CompactStorageProof(Container):
216+
key: Vector[uint8, 32]
217+
value: List[uint8, 8]
218+
#sorted: node nearest to root first
219+
proof: List[uint16, MAX_NODES_PER_PROOF]
220+
```
221+
222+
## Algorithms
223+
224+
This section contains descriptions of procedures relevant to `RequiredBlockState`, including their
225+
production (`construct_required_block_state`) and use (`use_required_block_state`).
226+
227+
### `construct_required_block_state`
228+
229+
For a given block, `RequiredBlockState` can be constructed using existing JSON-RPC methods by
230+
using the following algorithms/steps:
231+
1. `get_state_accesses` algorithm
232+
2. `get_proofs`
233+
3. `get_block_hashes`
234+
4. Create the `RequiredBlockState` SSZ container
235+
5. Use `compression_procedure` to compress the `RequiredBlockState`
236+
237+
### `get_state_accesses`
238+
239+
Call `debug_TraceBlock` with the prestate tracer, record key/value pairs where
240+
they are first encountered in the block.
241+
242+
```
243+
curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc": "2.0", "method": "debug_traceBlock", "params": ["finalized", {"tracer": "prestateTracer"}], "id":1}' http://127.0.0.1:8545 | jq
244+
```
245+
This will return state objects consisting of a key (account address), and value (state, which
246+
may include contract bytecode and storage key/value pairs). See two objects for reference:
247+
```json
248+
[
249+
"0x58803db3cc22e8b1562c332494da49cacd94c6ab": {
250+
"balance": "0x13befe42b38a40",
251+
"nonce": 54
252+
},
253+
"0xae7ab96520de3a18e5e111b5eaab095312d7fe84": {
254+
"balance": "0x4558214a60e751c3a",
255+
"code": "0x608060/* Snip (entire contract bytecode) */410029",
256+
"nonce": 1,
257+
"storage": {
258+
"0x1b6078aebb015f6e4f96e70b5cfaec7393b4f2cdf5b66fb81b586e48bf1f4a26": "0x0000000000000000000000000000000000000000000000000000000000000000",
259+
"0x4172f0f7d2289153072b0a6ca36959e0cbe2efc3afe50fc81636caa96338137b": "0x000000000000000000000000b8ffc3cd6e7cf5a098a1c92f48009765b24088dc",
260+
"0x644132c4ddd5bb6f0655d5fe2870dcec7870e6be4758890f366b83441f9fdece": "0x0000000000000000000000000000000000000000000000000000000000000001",
261+
"0xd625496217aa6a3453eecb9c3489dc5a53e6c67b444329ea2b2cbc9ff547639b": "0x3ca7c3e38968823ccb4c78ea688df41356f182ae1d159e4ee608d30d68cef320"
262+
}
263+
},
264+
...
265+
]
266+
```
267+
268+
### `get_proofs`
269+
270+
Call the `eth_getProof` JSON-RPC method for each state key (address) returned by the
271+
`get_state_accesses` algorithm, including
272+
storage keys if appropriate.
273+
274+
The block number used is the block prior to the block of interest (state is stored as post-block
275+
state).
276+
277+
For all account proofs, aggregate and sort the proof nodes and represent each proof as a list of
278+
indices to those nodes. Repeat for all storage proofs.
279+
280+
### `get_block_hashes`
281+
282+
Call `debug_TraceBlock` with the default tracer, record any use of the "BLOCKHASH" opcode.
283+
Record the block number (top of stack in the "BLOCKHASH" step), and the block hash (top
284+
of stack in the subsequent step).
285+
286+
### `use_required_block_state`
287+
288+
1. Obtain `RequiredBlockState`, for example by calling `eth_getRequiredBlockState`
289+
2. Use `compression_procedure` to decompress the `RequiredBlockState`
290+
3. `verify_required_block_state`
291+
4. `trace_block_locally`
292+
293+
### `verify_required_block_state`
294+
295+
Check block hashes are canonical such as a node or against an accumulator of canonical
296+
block hashes. Check merkle proofs in the requied block state.
297+
298+
### `trace_block_locally`
299+
300+
Obtain a block (`eth_getBlockByNumber` JSON-RPC method) with transaction bodies. Use an EVM
301+
and load it with the `RequiredBlockState` and the block. Execute
302+
the transactions in the block and observe the trace.
303+
304+
### `compression_procedure`
305+
306+
The `RequiredBlockState` returned by the `eth_getRequiredBlockState` JSON-RPC method is
307+
compressed. Snappy compression is used ([https://github.com/google/snappy](https://github.com/google/snappy)).
308+
309+
The encoding and decoding procedures are the same as that used in the Ethereum consensus specifications
310+
([https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#ssz-snappy-encoding-strategy](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#ssz-snappy-encoding-strategy)).
311+
312+
For encoding (compression), data is first SSZ-encoded and then snappy-encoded.
313+
For decoding (decompression), data is first snappy-decoded and then SSZ-decoded.
314+
315+
## Security
316+
317+
### Future protocol changes
318+
319+
Merkle patricia proofs may be replaced by verkle proofs after some hard fork.
320+
This would not invalidate `RequiredBlockState` data prior to that fork.
321+
The new proof format could be added to this specification for data after that fork.
322+
323+
### Canonicality
324+
325+
A recipient of `RequiredBlockState` must check that the blockhashes are part of the real
326+
Ethereum chain history. Failure to verify (`verify_required_block_state`) can result in invalid
327+
re-execution (`trace_block_locally`).
328+
329+
### Post-block state root
330+
331+
A user that has access to canonical block hashes and a sound EVM implementation has strong
332+
guarantees about the integrity of the block re-execution (`trace_block_locally`).
333+
334+
However, there is no guarantee to be able to compute a new block state root for this post-execution
335+
state. For example, with the aim to check against the state root in the block header of that block
336+
and thereby audit the state changes that were applied.
337+
338+
This is because the state changes may involve an arbitrary number of state deletions. State
339+
deletions may change the structure of the merkle trie in a way that requires knowledge of
340+
internal nodes that are not present in the proofs obtained by `eth_getProof` JSON-RPC method.
341+
Hence, while the complete post-block trie can sometimes be created, it is not guaranteed.
342+
343+

0 commit comments

Comments
 (0)