Skip to content

Commit 81a1ec6

Browse files
committed
doc: add spec for address appearances
1 parent 12e649e commit 81a1ec6

File tree

1 file changed

+287
-0
lines changed

1 file changed

+287
-0
lines changed

src/eth/addresses.md

Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
# Eth API - Address appearance specification
2+
3+
Specification for addresses appearances returned by `eth_getAddressesInBlock`
4+
5+
## Introduction
6+
7+
An address may "appear" in a Ethereum transaction. If an address appears in a transaction, that
8+
transaction could be meaningful to examine as a historical record.
9+
10+
For example, an appearance may be "an address that is a recipient of a transfer of Ether during
11+
the EVM execution". Such an appeaarance (the presence of the address in that transaction in that
12+
way) makes that transaction meaningful in an examination of the historical balances of that
13+
address.
14+
15+
A collection of "address appearances" (defined in subsequent section) consistutes a set of transactions
16+
that are sufficient to form a complete historical analysis of "activity" for that address.
17+
This "activity" may take many meanings (programs in the EVM may do arbitrary things), but can
18+
be identified structurally as will be shown.
19+
20+
## Overview
21+
22+
An address may appear in different parts of a transaction. This might include being the sender or
23+
recipient of a transfer, a block reward recipient, or other categories. One main category
24+
is the address of a piece of code that was run during the transaction. This code address can be
25+
readily identified in the transaction without a need to understand the purpose or nature of
26+
the code.
27+
28+
The identification of an appearance solves a discovery problem. Once an important transaction
29+
for a particular address have been found, an analysis of what the appearance means can be performed,
30+
although this is beyond the scope of this specification.
31+
32+
## Specification
33+
34+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.
35+
36+
### Type aliases
37+
38+
|Name|Example|Description|
39+
|-|-|-|
40+
|Hex-string|`0x0abc`|Hex encoded, 0x-prefixed, leading 0's permitted|
41+
|Hex-number|`0xabcd`|Hex encoded, 0x-prefixed, leading 0's omitted|
42+
43+
### Design parameters
44+
|Name|Value|Description|
45+
|-|-|-|
46+
|`MAX_VANITY_ZERO_CHARS`|`8`|Leading or trailing '0's permitted in an address|
47+
48+
### Derived parameters
49+
|Name|Definition|Value|Description|
50+
|-|-|-|-|
51+
|`MIN_NONZERO_BYTES`|`20 - MAX_VANITY_ZERO_CHARS // 2`|`16`| Smallest possible nonzero component of an address|
52+
53+
## Address definition
54+
55+
An address is informally defined as 40 hexadecimal characters that
56+
- May include some leading or trailing zeros (for vanity addresses)
57+
- Appears in the EVM environment in a 32 byte section with left padding.
58+
or example, in transaction calldata, in this case the calldata is trimmed to a 32 byte multiple,
59+
divided into 32 byte sections with checked separately for "address" or "not address" classification.
60+
61+
An address has the following formal definition:
62+
- MUST be 20 bytes
63+
- MUST begin or end with `MIN_NONZERO_BYTES` non-zero bytes.
64+
This allows for vanity address inclusion of up to `MAX_VANITY_ZERO_CHARS`.
65+
- MUST NOT be a known precompile
66+
- When detected within > 20 bytes, 32 bytes MUST appear with left padding (12 leading zero-bytes).
67+
The address that meets the criteria is extracted from the source bytes.
68+
- When inspecting >32 bytes, the bytes MUST first trimmed to a multiple of 32 bytes and divided
69+
into 32 byte sections (length modulo 32) to be examined separately.
70+
71+
The following examples show the detection and extraction of an address/addresses
72+
from a sequence of bytes.
73+
### Example "address"
74+
Deposit contract address in a 32 byte hex string with left padding:
75+
```
76+
"0x00000000000000000000000000000000219ab540356cBB839Cbe05303d7705Fa"
77+
> ["0x00000000219ab540356cBB839Cbe05303d7705Fa"]
78+
```
79+
80+
Deposit contract address bytes, but shifted left. A valid address is detected,
81+
but is not the deposit contract address:
82+
```
83+
"0x00000000000000000000219ab540356cBB839Cbe05303d7705Fa000000000000"
84+
> ["0x219ab540356cBB839Cbe05303d7705Fa00000000"]
85+
```
86+
87+
Data from the two examples above, concatenated:
88+
```
89+
0x00000000000000000000000000000000219ab540356cBB839Cbe05303d7705Fa00000000000000000000219ab540356cBB839Cbe05303d7705Fa000000000000
90+
> ["0x219ab540356cBB839Cbe05303d7705Fa00000000", "0x00000000219ab540356cBB839Cbe05303d7705Fa]
91+
```
92+
93+
Data from the example above, with additional bytes that are truncated as modulo 32 bytes:
94+
```
95+
0x00000000000000000000000000000000219ab540356cBB839Cbe05303d7705Fa00000000000000000000219ab540356cBB839Cbe05303d7705Fa000000000000
96+
> ["0x219ab540356cBB839Cbe05303d7705Fa00000000", "0x00000000219ab540356cBB839Cbe05303d7705Fa]
97+
```
98+
99+
### Example "not address"
100+
101+
Deposit contract address in a 32 byte hex string with right padding. No address is detected because the leftmost 24 characters (12 bytes) must all be zeros:
102+
```
103+
"0x00000000219ab540356cBB839Cbe05303d7705Fa000000000000000000000000"
104+
> []
105+
```
106+
107+
No address is detected because the leftmost 24 characters (12 bytes) are ignored and the
108+
remaining string has too many trailing 0's (`0x9Cbe05303d7705Fa000000000000000000000000`):
109+
```
110+
"0x0000000000000000000000009Cbe05303d7705Fa000000000000000000000000"
111+
> []
112+
```
113+
Nonzero characters long enough for an address, but rejected for spanning a 32 byte boundary
114+
(read as `0x...000219ab540356cBB83` and `0x9Cbe05303d7705Fa000...`):
115+
```
116+
"0x000000000000000000000000000000000000000000000000219ab540356cBB839Cbe05303d7705Fa000000000000000000000000000000000000000000000000"
117+
> []
118+
```
119+
## Appearance definition
120+
121+
An address appearance is informally defined as the transaction identifier for a transaction
122+
that contains that particular address. That is, transaction "A" is an appearance of address "B"
123+
if address "B" is part of transaction "A" in an important way, E.g., One of sender, recipient,
124+
code address, etc..
125+
126+
For a given address, a transaction MUST be classified as an appearance if any any of the following
127+
conditions are met. Conditions are divided into different sections for clarity.
128+
129+
### Intra-transaction appearances
130+
An address MAY appear in any of the following:
131+
|Short description|Description|Access Comment|
132+
|-|-|-|
133+
|Sender|Transaction "from" field|Transaction body|
134+
|Target|Transaction "to" field|Transaction body|
135+
|Calldata|Transaction "input" field|Transaction body. Transaction. 32 byte aligned, modulo 32 bytes|
136+
|Log origin|Log family (LOG0, LOG1, LOG2, LOG3, LOG4) address|Transaction receipt|
137+
|Log topics|Log topic family (LOG1, LOG2, LOG3, LOG4) topic index 1, 2, 3 or 4|Transaction receipt|
138+
|Log data|Log family (LOG0, LOG1, LOG2, LOG3, LOG4) data|Transaction receipt. 32 byte aligned, modulo 32 bytes|
139+
|Opcode address argument|Opcode "address" parameter (including but not limited to CALL, CALLCODE, STATICCALL, DELEGATECALL, SELFDESTRUCT)|Accessible via call tracer "to" field|
140+
|Internal return data|RETURN data defined by "offset" and "size" fields |Accessible via call tracer "output" field. 32 byte aligned, modulo 32 bytes|
141+
|Create address|Create family opcode (CREATE or CREATE2) return "address" field|Accessible via call tracer "to" field|
142+
|Internal calldata|Call family opcode (CALL, CALLCODE, STATICCALL or DELGATECALL) argument data defined by opcode "argsOffset" and "argsSize" fields|Accessible via call tracer "input" field. 32 byte aligned, modulo 32 bytes|
143+
|Internal return data|Call family opcode (CALL, CALLCODE, STATICCALL or DELGATECALL) return data defined by opcode "retOffset" and "retSize" fields|Accessible via call tracer "output" field. 32 byte aligned, modulo 32 bytes|
144+
|Internal create data|Create-family (CREATE or CREATE2) data defined by opcode "offset" and "size" fields|Accessible via call tracer "input" field. 32 byte aligned, modulo 32 bytes|
145+
146+
Note that the call tracer "to", "from", "input" and "output" fields are sufficient to capture all
147+
the required data not present in the transaction body and receipts. See below for an algorithm
148+
for finding appearances.
149+
150+
### Extra-transaction appearances
151+
An address MAY appear in any of the following:
152+
|Short description|Description|Comment|
153+
|-|-|-|
154+
|Block reward|An address in the "miner" field of a block header|-|
155+
|Uncle reward|An address in the "miner" field of a block header within the block "uncles" field array|-|
156+
|Withdrawal|An address in the "address" field of a block "withdrawals" field array object|-|
157+
158+
## Appearance components
159+
160+
An address appearance is defined as having the following components:
161+
- Block number
162+
- MUST be included for any appearance
163+
- Transaction index
164+
- MUST be included for any intra-transaction appearance.
165+
- MUST be omitted (empty list) if an extra-transaction appearance occurs but an intra-transaction does not occur.
166+
167+
## Algorithm
168+
169+
### Address detection
170+
171+
172+
A 32 byte string may be inspected to determine if it meets criteria for an address as follows:
173+
174+
```go
175+
// Source: UnchainedIndex Specification, trueblocks-core@v0.51.0, Go implementation
176+
func potentialAddress(addr string) bool {
177+
// Any 32 byte value smaller than this number (including precompiles)
178+
// are assumed to be baddresses. While there are technically a very
179+
// large number of addresses in this range, we choose to eliminate them
180+
// in an effort to keep the index small.
181+
//
182+
// While this may seem drastic—that a lot of addresses are being excluded,
183+
// the number is actually a quite small number--less than two out of
184+
// every 10000000000000000000000000000000000000000000000 20-bytes strings
185+
// are excluded, and almost every one of these are actually numbers such
186+
// account balance or number of tokens transferred. It’s worth it.
187+
small := "00000000000000000000000000000000000000ffffffffffffffffffffffffff"
188+
// -------+-------+-------+-------+-------+-------+-------+-------+
189+
if addr <= small {
190+
return false
191+
}
192+
// Any 32 byte value with less than this many leading zeros assumed to be
193+
// a baddress. (Most addresses are 20-bytes long and left-padded with zeros
194+
// Note: we’re processing these as strings, so 24 characters is 12 bytes.
195+
largePrefix := "000000000000000000000000"
196+
// -------+-------+-------+
197+
if !strings.HasPrefix(address, largePrefix) {
198+
return false
199+
}
200+
// A large number of what would normally be considered valid addresses
201+
// happen to end with eight zeros. We’re not sure why, but we identify
202+
// these as badresses as well in a final effort to lower the size of
203+
// the index. We’ve seen no obvious ill-effects from this choice.
204+
if strings.HasSuffix(address, "00000000") {
205+
return false
206+
}
207+
return true
208+
}
209+
```
210+
211+
### Appearance detection
212+
213+
Appearances are be detected by inspecting a block. Implementations serving the
214+
`eth_getAddressesInBlock` may benefit from a custom algorithm. For demonstration purposes,
215+
the following procedure can be performed by use of existing JSON-RPC endpoints (availability depends
216+
on the client used).
217+
218+
1. Call `eth_getBlockByNumber` with params `[block_number]` to include transactions.
219+
Extract addresses from block header (e.g., miner, uncles, withdrawals)
220+
2. Get the call tracer via the command below. Extract addresses from fields as described in the
221+
appearances table in the prior section.
222+
3. For each address found, record the transaction(s) it appeared in (if appropriate).
223+
224+
225+
Command to obtain the call tracer with logs:
226+
```command
227+
$ curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc": "2.0", "method": "debug_traceBlockByNumber", "params": [17796114, {"tracer": "callTracer", "tracerConfig":{"withLog": true}}], "id":1}' http://127.0.0.1:8545 | jq
228+
```
229+
230+
It can be seen in the example below (a subset of the response from the above call)
231+
```json
232+
{
233+
"result": {
234+
"from": "0x9853fda0b5e99eac2968dc59ad37cded61cb1bf5",
235+
"gas": "0xd3c7",
236+
"gasUsed": "0xcc7c",
237+
"to": "0x1a0ad011913a150f69f6a19df447a0cfd9551054",
238+
"input": "0xe9e05c420000000000000000000000009853fda0b5e99eac2968dc59ad37cded61cb1bf500000000000000000000000000000000000000000000000000038d7ea4c6800000000000000000000000000000000000000000000000000000000000000186a0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000000000000000000000000000000000",
239+
"calls": [
240+
{
241+
"from": "0x1a0ad011913a150f69f6a19df447a0cfd9551054",
242+
"gas": "0xbd57",
243+
"gasUsed": "0x623a",
244+
"to": "0x43260ee547c3965bb2a0174763bb8fecc650ba4a",
245+
"input": "0xe9e05c420000000000000000000000009853fda0b5e99eac2968dc59ad37cded61cb1bf500000000000000000000000000000000000000000000000000038d7ea4c6800000000000000000000000000000000000000000000000000000000000000186a0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000000000000000000000000000000000",
246+
"calls": [
247+
{
248+
"from": "0x1a0ad011913a150f69f6a19df447a0cfd9551054",
249+
"gas": "0x9257",
250+
"gasUsed": "0x1f0b",
251+
"to": "0xa3cab0126d5f504b071b81a3e8a2bbbf17930d86",
252+
"input": "0xcc731b02",
253+
"output": "0x0000000000000000000000000000000000000000000000000000000001312d00000000000000000000000000000000000000000000000000000000000000000a0000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000000000000000000000000000003b9aca0000000000000000000000000000000000000000000000000000000000000f424000000000000000000000000000000000ffffffffffffffffffffffffffffffff",
254+
"calls": [
255+
{
256+
"from": "0xa3cab0126d5f504b071b81a3e8a2bbbf17930d86",
257+
"gas": "0x7d0a",
258+
"gasUsed": "0xb7c",
259+
"to": "0x17fb7c8ce213f1a7691ee41ea880abf6ebc6fa95",
260+
"input": "0xcc731b02",
261+
"output": "0x0000000000000000000000000000000000000000000000000000000001312d00000000000000000000000000000000000000000000000000000000000000000a0000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000000000000000000000000000003b9aca0000000000000000000000000000000000000000000000000000000000000f424000000000000000000000000000000000ffffffffffffffffffffffffffffffff",
262+
"type": "DELEGATECALL"
263+
}
264+
],
265+
"type": "STATICCALL"
266+
}
267+
],
268+
"logs": [
269+
{
270+
"address": "0x1a0ad011913a150f69f6a19df447a0cfd9551054",
271+
"topics": [
272+
"0xb3813568d9991fc951961fcb4c784893574240a28925604d09fc577c55bb7c32",
273+
"0x0000000000000000000000009853fda0b5e99eac2968dc59ad37cded61cb1bf5",
274+
"0x0000000000000000000000009853fda0b5e99eac2968dc59ad37cded61cb1bf5",
275+
"0x0000000000000000000000000000000000000000000000000000000000000000"
276+
],
277+
"data": "0x0000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000004900000000000000000000000000000000000000000000000000038d7ea4c6800000000000000000000000000000000000000000000000000000038d7ea4c6800000000000000186a0000000000000000000000000000000000000000000000000"
278+
}
279+
],
280+
"type": "DELEGATECALL"
281+
}
282+
],
283+
"value": "0x38d7ea4c68000",
284+
"type": "CALL"
285+
}
286+
}
287+
```

0 commit comments

Comments
 (0)