Track the coverage of human games from the Lichess open database on chessdb.cn (cdb). Here we look both at coverage of human opening theory, and statistics of the exit ply for the analysed games.
For the opening coverage we count the number of known/unknown positions arising in the monthly rated Lichess games amongst players with an Elo of at least 2200. For an intermittently updated static dump of cdb we track both the percentage of unique unknown positions within the first 10, 20 and 30 moves, as well as the percentage of known "visits", that is, each position that is encountered in the games is weighted by the number of times it was seen. Here the starting position is ignored, as well as positions that cannot appear in cdb, i.e. positions with fewer than eight pieces and positions without a legal move.
For the exit ply statistics, random samples of the monthly rated Lichess games at various time controls and Elo brackets are analysed. Plots of the exit ply distribution, for cdb and the dump, can be found below. In addition, the repo reports on the evolution of the progress indicator
where
|
|
|
|
|
|
|
|
|
![]() |
![]() |
![]() |
|
|
|
|
|
|
![]() |
![]() |
![]() |
Via a cron job the script do_track.sh regularly does the
following:
- Check the Lichess open database for a new monthly release of rated standard chess games.
- Run the awk scripts
create_tc_Elo_buckets.awkandfilter_clean_Elo.awk. The former uses reservoir sampling to randomly sample (up to) 100K each for the Elo brackets 2200+, 1800-2200, 1400-1800 at blitz, rapid and classical time controls. The second script finds all the games between 2200+ Elo players. Both scripts only select from human games that terminate normally and have at least one (half)move. - Run a compiled binary of
litrack2dump.cppto probe a local cdb dump for the exit plies of the (about) 900K randomly sampled games and store the FEN of the final position still in the dump, together with the remaining moves of the game, to a file. - Run the python script
litrack2cdb.pyto query cdb for the final known position of these (about) 900K games, starting from the output of the previous step. - Run the python script
litrack.pyto extract the exit ply statistics from the output produced in the previous two steps. - Spawn the script
do_coverage.shto count the number of positions in the 2200+ Elo games (blitz, rapid and classical combined) at ply depths 20, 40 and 60 that cannot be found in the dump. The script uses compiled binaries from cdbdirect and a tailor-made fork of fastpopular. - Run the python scripts
plotexitply.pyandplotcoverage.pyto produce graphical representations of the data.
- disservin/chess-library : pgn parsing
- vondele/fastpopular : pgn.zst parsing and counting
- vondele/cdbdirect : probing local cdb dump
- robertnurnberg/cdblib : querying cdb api
- Choosing a random position from the first 20/40/60 plies of a random Elo 2200+ game on Lichess (at blitz, rapid or classical TC, and excluding the starting position), gives a 97%/61%/44% chance that it is already known to cdb.
- An Elo2200+ game on Lichess at classical TC will exit from cdb on average at about ply 24.




















