This repository contains the analytical pipeline and data collection suite used to identify and categorize Ethereum market participants in Uniswap v3 liquidity pools. Built for our MEV bot research project.
Eallet labeling from Etherscan and other systems often lags behind the MEV industry. This project introduces a novel Negative Binomial Distribution (NBD) and NLP analytics methodology to profile ETH addresses.
The codebase is organized into a modular pipeline:
scripts/collector.py: A high-performance Uniswap Pool monitoring system. It processes blocks sequentially to generate a transaction log, including raw event logs and transaction input data, similar to propriatory CEX trader login logs.
scripts/wrapper.py: The primary execution engine that transforms raw logs into the processed feature set used for behavioral taxonomy.scripts/wallet_forensics.py: The WalletForensics module. It uses NLP-inspired collocation analysis (bigrams/trigrams) to detect statistically non-random co-occurrence of addresses, identifying coordinated ETH wallet clusters and syndicates.scripts/cex_library.py: Utilities for fetching tick-level price data from CEX (e.g., Kraken). This enables the calculation ofalpha_reaction_rateby aligning DEX swaps with external market signals.
scripts/align_profile.py: Enriches behavioral data with infrastructure-level signals, includingbytecode_len(contract complexity), andeth_balance.scripts/alchemy_lib.py: A dedicated wrapper for the Alchemy JSON-RPC API to handle high-concurrency on-chain data requests.
This project is licensed under the MIT License.
The underlying dataset (Uniswap v3 ETH/USDC swap logs and processed data with ETH addresses metrics) used in the associated research and generated by this open-source software is archived on Zenodo at, DOI 10.5281/zenodo.18674643, Ethereum Wallet Profiling Data: Raw Uniswap Transaction Logs and NBD-Processed Behavioral Features
https://doi.org/10.5281/zenodo.18674616
https://doi.org/10.5281/zenodo.18674644
leaderboard/: MEV / DEX Wallet Leaderboard
These scripts are a lightweight open monitoring system for the Uniswap V3 USDC/ETH pool on Ethereum mainnet (one of the most actively traded decentralised venues in DeFi). Every swap, every wallet, every gas fee is recorded permanently on the public blockchain. We collect that data with scripts/collector.py, process it daily, and rank wallets by how much value they extracted from the pool.
The leaderboard/leaderboard.py tracks net USDC extracted after gas, win rate, profit factor, and CEX-informed trade percentage to give a picture of who has systematic edge in this market and how they operate.
This leaderboard tracks every wallet (Ethereum address) that traded in the Uniswap V3 USDC/ETH pool on Ethereum mainnet, ranked by how much USDC they extracted from the pool after paying gas costs. It updates once a day and shows both a 24-hour and a 7-day rolling window.
Every swap in a liquidity pool is a transfer of value between the trader and the pool's liquidity providers (LPs). When a wallet sells ETH into the pool, it receives USDC, extracting liquidity. When it buys ETH, it deposits USDC. Net USDC extracted is the difference: how much more USDC a wallet took out than it put in, over the measurement window. A high positive number means the wallet was a consistent net seller of ETH into the pool, profiting at the expense of LPs or less informed traders on the other side.
The top-ranked addresses are almost exclusively sophisticated on-chain actors — MEV bots, CEX-DEX arbitrageurs, and sandwich bots. A wallet appearing consistently at the top of both daily and weekly rankings is running an automated strategy that identifies and captures price discrepancies between this pool and centralised exchanges like Kraken or Binance. The "Informed %" column shows what fraction of their trades aligned with concurrent price movement on Kraken — values above 65% strongly suggest the wallet is trading on CEX price feed signals before the DEX price catches up.
For liquidity providers, these wallets represent the primary source of adverse selection loss — the cost of having your liquidity traded against by someone with better information. For researchers and funds, the leaderboard is a live map of who has systematic edge in this market, how consistent that edge is (profit factor, win rate), and how capital efficient each strategy is (net USDC per trade). Wallets that rank highly week after week are running durable, automated strategies worth studying closely.