This section will come back soon as v2 is getting ready!
Sandwiching refers to the action of forcing the earlier inclusion of a transaction (frontrun) before a transaction published earlier (victim), with another transaction after the victim transaction to realise a profit (backrun), while abusing the victim's slippage settings. We define a sandwich as "a set of transactions that include exactly one frontrun and exactly one backrun transaction, as well as at least one victim transaction", a sandwicher as "a party that sandwiches", and a colluder as "a validator that forwards transactions they receive to a sandwicher".
Some have mentioned that users should issue transactions with lower slippage instead but it's not entirely possible when trading token pairs with extremely high volatility. Being forced to issue transactions with low slippage may lead to higher transaction failure rates and missed opportunities, which is also suboptimal.
The reasons why sandwiching is harmful to the ecosystem had been detailed by another researcher and shall not be repeated in detail here, but it mainly boils down to breaking trust, transparency and fairness.
We believe that colluder identification should be a continuous effort since generating new keys to run a new validator is essentially free, and with a certain stake pool willing to sell stake to any validator regardless of operating history, one-off removals will prove ineffective. This repository aims to serve as a tool to continuously identify sandwiches and colluders such that relevant parties can remove stake from sandwichers as soon as possible.
Law of large numbers - the average of the results obtained from a large number of independent random samples converges to the true value, if it exists [source]. In other words, an average validator running the average software should produce average numbers in the long run, the longer the run, the closer the validator's average is to the global average.
In this application, we consider an observation of "how many sandwiches are in the block" and "is there a sandwich in the block" a sample. Forus to apply LLM here we need to be reasonably sure that:
- The samples are independent;
- The average exists.
It's clear that the average clearly exists - it should be very close to the observered cluster average given the large number of slots we're aggregating over.
According to Anza's docs, the distribution of leader slots is random but stake-weighted. While it's possible to influence the distribution (e.g. maximise the chances that a certain set of validators' slots follows another set's) by strategically creating stake accounts, and technically it would be beneficial to avoid having leader slots after validators known to be less performant to avoid skipped slots (therefore missing out of block rewards), this has nothing to do with sandwiching as validators are economically incentivised to leave the transactions that pay the most to themselves. This also applies to sandwichable transactions, if a validator knows that a transaction is sandwichable and is willing to exploit it, its only option would be to exploit the transaction itself, or forward it to a sandwicher. In other words, sandwicher colluders (RPCs validators alike) normally won't forward sandwich-able transactions to the next leader "just to mess with their numbers". As such, the leader slot distribution depends entirely on the cluster's actions and is considered random.
Another important factor to consider is the difference between transaction delivery across nodes. Some transaction senders may decide to not have their transactions sent directly from RPC nodes to certain validators due to different concerns, such as being sandwiched, but it's unlikely that any given transaction sender will blacklist the majority of the validators to supress their sandwiching numbers. If and when such facilities are used, it'll most likely decrease the number of transactions reaching known sandwacher colluders, supressing their numbers instead. There is little data on the usage of such facilities but we expect their usage to not affect the independence of the sampling.
From our analysis above, we're confident that LLM can be applied to sandwicher colluder identification as the average we're looking for exists, and the samples (or at least groups of 4 samples, corresponding a leader group) are independent. Which means, if your sandwiching numbers deviate from the cluster average significantly, we're pretty sure (but not 100% as with any statistics-based hypothesis) you're engaged with something related to sandwiching.
A sandwich is defined by a set of transactions that satisfies all of the following:
- Has at least 1 transaction for frontrunning and backrunning, with at least 1 victim transaction between the frontrun and backrun.
- The inputs of the backrun must match the output of the frontrun, or be connected by transfers.
- The frontrun and the victim transactions trades in the same direction, the backrun's one is in reverse;
- Output of backrun >= Input of frontrun and Output of frontrun >= Input of backrun (profitability constraint);
- All transactions use the same AMM;
- Each victim transaction's signer differs from the frontrun's and the backrun's;
- The wrapper program in the frontrun and backrun are the same;
For each sandwich identified in newly emitted blocks by the cluster, we insert that to a database for report generation.
Note that we don't require the frontrun and the backrun to have the same signer as it's a valid strategy to use multiple wallets to evade detection by moving tokens across wallets. The "victim signer differs from attackers" and "wrapper program present" constraints were removed in v2 since pvp is fun, and we've identified sandwiches that invoke certain AMM programs directly, but the frontrun's output matches the backrun's input exactly, suggesting sandwiching intention.
With the sandwich dataset, we're able to calculate the cluster wide and per validator proportion of sandwich-inclusive blocks and sandwich per block. Our hypothesis is that colluders will exhibit above cluster average values on both metrics. Due to transaction landing delays, the report generation tool also "credits" sandwiches to earlier slots.
The hypothesises are as follows:
Null hypothesis: At least one metric is in line with the cluster average
Alternative hypothesis: Both metrics exceeds cluster average
For the proportion of sandwich-inclusive blocks metric, each block is treated as a Bernoulli trial, where success means a block is sandwich-inclusive and failure means the otherwise. For each validator, the number of blocks emitted (N) and the number of sandwich-inclusive blocks (k) is used to calculate a 99.99% confidence interval of their true proportion of sandwich-inclusion blocks. A validator will be deemed to be above cluster average if the lower bound of the confidence interval is above the cluster average.
For the sandwiches per block metric, the mean and standard deviation of the cluster wide number of sandwiches per block is taken, and a 99.99% confidence interval of the expected number of sandwiches per block should the validator is in line with the cluster wide average is calculated. A validator will be deemed to be above cluster average if the validator's metric is above the confidence interval's upper bound.
Validators satisfying the alternative hypothesis, signaling collusion for an extended period, will be flagged.
For flagging on Hanabi Staking's dashboard, flagged validators with fewer than 50 blocks as well as those only exceeding the thresholds marginally but reputable are excluded.
There are two CSV files, report.csv and filtered_report.csv. The first file shows all validators' metrics while the second one shows the ones with abnormally high values. It's normal for your validator to show up in report.csv.
The CSV files contain 14 columns each and their meanings are as follows:
| Column(s) | Meaning |
|---|---|
| leader/vote | The validator's identity and vote account pubkeys |
| name | The validator's name according to onchain data |
| Sc | "Score", normalised weighted number of sandwiches |
| Sc_p | "Presence score", normalised number of blocks with sandwiches, which roughly means proportion of sandwich inclusive blocks |
| R-Sc/R-Sc_p | Unnormalised Sc and Sc_p |
| slots | Number of leader slots observed for the validator |
| Sc_p_{lb|ub} | Bounds of the confidence interval of the validator's true proportion of sandwich inclusive blocks. Flagged if the lower bound is above the cluster mean |
| Sc_{lb|ub} | Bounds of the confidence interval of which the validator is considered to have an "average" number of sandwiches per block. Flagged if Sc is above the upper bound |
| {Sc_p|Sc}_flag | True if the validator is being flagged due to the respective metric, false otherwise |
For dataset access, join the Hanabi Staking Discord and open a ticket.