Comparing the quality of backwards analysis strategies

Lichess aims to provide deterministic, strong, and fast chess game analysis. There is some tension between these goals.

Analysis is provided by volunteers, running the fishnet client (as version 2.7.0 running Stockfish 16) on a wide variety of hardware. Determinism ensures users can expect consistent quality, and bugs or manipulation can be reliably identified.

Trying to optimize for quality while keeping computing power fixed, it is known that backwards analysis is efficient. When analysing the game backwards, move by move, the chess engine's hash table (also known as transposition table) will often contain highly relevant information about what happened later in the game.

However, only single-threaded analysis is deterministic. So on one extreme side, if we want to fully utilize the hash table, no other threads can contribute to quickly finishing the particular game. On the other extreme side, fishnet 2.7.0 analyses individual positions in parallel, each with a pristine hash table, finishing games very quickly.

More balanced strategies might be worth considering. To get some sharing and some parallelism, the game could be split into (possibly overlapping) chunks of consecutive positions.

What are we missing out on today, and how do all of these approaches stack up in terms of quality?

Method

For forward play, the strength of players can be evaluated directly by looking at the outcome of their games, and we expect strong engines to provide high quality analysis. For a tweak that weakens the engine, like clearing the hash table after each move, an "equivalent" tweak is also expected to lower the quality of backwards analysis.

However, it is not immediately clear if the hash table is perhaps more or less important in backwards analysis, or what the equivalent of overlap in backwards analysis would even be. So we look for a more direct way to evaluate the quality of backwards analysis. Thanks MoistvonLipwig, for the hints.

Here, 1000 games are randomly selected from rated games played on Lichess in June 2023 for which engine analysis had been requested.

The games are then re-analysed with Stockfish 16 at very high node limits, 100 meganodes for each position, using 1 GiB hash in single-threaded backwards analysis.

analysis-1024-100000000-inf-0.pgn.zst

The first 10 plies of each game are ignored for the following analysis. As a result, for each non-opening move, we now have an expected game theoretical outcome \(E_i\) between \(0\) and \(1\), based on Stockfish's WDL model, and the recommended primary variation.

The node limits used for the strategies we want to evaluate are about two orders of magnitude lower. Hopefully, from that perspective it's reasonable to accept these evaluations as near-objective truth, as close to the true evaluation of each position (\(0\), \(\frac{1}{2}\), or \(1\)) and optimal play as we can expect to get.

To evaluate a strategy, it is used to analyse the games to obtain evaluations \(\hat{E_i}\) and primary variations.

We score strategies by:

Experiments

All of the following experiments are performed with Stockfish 16 in single-threaded backwards analysis, and are thus exactly reproducible. UCI_AnalyseMode is always on. We vary the following dimensions:

Node limits

Hash table size

Chunk size

Discussion

Unsurprisingly, the ability to predict evaluations is basically the same as the ability to predict moves. The latter metric appears to be slightly less noisy.

The primary threat to validity is the choice of those metrics. If we go along with it, we see that:

Raw data

PGNHashNodesChunk sizeOverlapMSEPV miss
analysis-1024-100000000-inf-0.pgn.zst1024 MiB100,000,00000.000000.00 %
analysis-1024-10000000-inf-0.pgn.zst1024 MiB10,000,00000.0006516.29 %
analysis-1024-2800000-inf-0.pgn.zst1024 MiB2,800,00000.0009219.13 %
analysis-1024-2500000-inf-0.pgn.zst1024 MiB2,500,00000.0009519.30 %
analysis-1024-2900000-inf-0.pgn.zst1024 MiB2,900,00000.0009319.31 %
analysis-1024-3000000-inf-0.pgn.zst1024 MiB3,000,00000.0009519.34 %
analysis-1024-2700000-inf-0.pgn.zst1024 MiB2,700,00000.0009819.34 %
analysis-1024-2400000-inf-0.pgn.zst1024 MiB2,400,00000.0009719.47 %
analysis-1024-2600000-inf-0.pgn.zst1024 MiB2,600,00000.0009619.48 %
analysis-1024-2300000-inf-0.pgn.zst1024 MiB2,300,00000.0009919.57 %
analysis-1024-2200000-inf-0.pgn.zst1024 MiB2,200,00000.0010219.57 %
analysis-1024-2100000-inf-0.pgn.zst1024 MiB2,100,00000.0010319.95 %
analysis-1024-2000000-inf-0.pgn.zst1024 MiB2,000,00000.0010319.96 %
analysis-1024-1900000-inf-0.pgn.zst1024 MiB1,900,00000.0010520.22 %
analysis-1024-1800000-inf-0.pgn.zst1024 MiB1,800,00000.0010820.23 %
analysis-1024-1700000-inf-0.pgn.zst1024 MiB1,700,00000.0010820.39 %
analysis-1024-1600000-inf-0.pgn.zst1024 MiB1,600,00000.0010820.50 %
analysis-256-1500000-inf-0.pgn.zst256 MiB1,500,00000.0010920.60 %
analysis-1024-1500000-inf-0.pgn.zst1024 MiB1,500,00000.0010820.69 %
analysis-64-1500000-inf-0.pgn.zst64 MiB1,500,00000.0011120.69 %
analysis-32-1500000-inf-0.pgn.zst32 MiB1,500,00000.0011120.71 %
analysis-128-1500000-inf-0.pgn.zst128 MiB1,500,00000.0011220.73 %
analysis-8-1500000-inf-0.pgn.zst8 MiB1,500,00000.0011020.73 %
analysis-512-1500000-inf-0.pgn.zst512 MiB1,500,00000.0011020.77 %
analysis-128-1500000-10-1.pgn.zst128 MiB1,500,0001010.0011320.83 %
analysis-1024-1400000-inf-0.pgn.zst1024 MiB1,400,00000.0011120.83 %
analysis-4-1500000-inf-0.pgn.zst4 MiB1,500,00000.0011420.90 %
analysis-16-1500000-inf-0.pgn.zst16 MiB1,500,00000.0011020.96 %
analysis-2-1500000-inf-0.pgn.zst2 MiB1,500,00000.0011221.02 %
analysis-128-1500000-8-1.pgn.zst128 MiB1,500,000810.0011121.02 %
analysis-128-1500000-7-1.pgn.zst128 MiB1,500,000710.0011221.03 %
analysis-128-1500000-9-1.pgn.zst128 MiB1,500,000910.0011021.04 %
analysis-1-1500000-inf-0.pgn.zst1 MiB1,500,00000.0011221.05 %
analysis-1024-1300000-inf-0.pgn.zst1024 MiB1,300,00000.0011421.07 %
analysis-128-1500000-10-0.pgn.zst128 MiB1,500,0001000.0011021.12 %
analysis-128-1500000-6-1.pgn.zst128 MiB1,500,000610.0011021.21 %
analysis-1024-1200000-inf-0.pgn.zst1024 MiB1,200,00000.0011621.21 %
analysis-128-1333333-8-1.pgn.zst128 MiB1,333,333810.0011621.23 %
analysis-128-1500000-4-1.pgn.zst128 MiB1,500,000410.0011221.30 %
analysis-128-1500000-5-1.pgn.zst128 MiB1,500,000510.0011121.35 %
analysis-128-1500000-8-0.pgn.zst128 MiB1,500,000800.0011221.38 %
analysis-128-1350000-9-1.pgn.zst128 MiB1,350,000910.0011521.38 %
analysis-128-1312500-7-1.pgn.zst128 MiB1,312,500710.0011321.42 %
analysis-128-1285714-6-1.pgn.zst128 MiB1,285,714610.0011421.46 %
analysis-128-1363636-10-1.pgn.zst128 MiB1,363,6361010.0011221.51 %
analysis-1024-1100000-inf-0.pgn.zst1024 MiB1,100,00000.0011821.51 %
analysis-128-1500000-9-0.pgn.zst128 MiB1,500,000900.0011421.55 %
analysis-128-1200000-4-1.pgn.zst128 MiB1,200,000410.0011721.64 %
analysis-128-1500000-3-1.pgn.zst128 MiB1,500,000310.0011021.65 %
analysis-128-1500000-2-1.pgn.zst128 MiB1,500,000210.0011321.65 %
analysis-128-1250000-5-1.pgn.zst128 MiB1,250,000510.0011721.71 %
analysis-128-1500000-6-0.pgn.zst128 MiB1,500,000600.0011321.74 %
analysis-1024-1000000-inf-0.pgn.zst1024 MiB1,000,00000.0012021.79 %
analysis-128-1500000-7-0.pgn.zst128 MiB1,500,000700.0010721.79 %
analysis-1024-900000-inf-0.pgn.zst1024 MiB900,00000.0011922.03 %
analysis-128-1500000-1-1.pgn.zst128 MiB1,500,000110.0011322.13 %
analysis-128-1500000-5-0.pgn.zst128 MiB1,500,000500.0011222.18 %
analysis-128-1125000-3-1.pgn.zst128 MiB1,125,000310.0011722.26 %
analysis-1024-800000-inf-0.pgn.zst1024 MiB800,00000.0012722.31 %
analysis-128-1500000-4-0.pgn.zst128 MiB1,500,000400.0011422.34 %
analysis-128-1000000-2-1.pgn.zst128 MiB1,000,000210.0012422.57 %
analysis-1024-700000-inf-0.pgn.zst1024 MiB700,00000.0013322.59 %
analysis-1024-600000-inf-0.pgn.zst1024 MiB600,00000.0014422.91 %
analysis-128-1500000-3-0.pgn.zst128 MiB1,500,000300.0011722.95 %
analysis-128-1500000-2-0.pgn.zst128 MiB1,500,000200.0012023.33 %
analysis-1024-500000-inf-0.pgn.zst1024 MiB500,00000.0014623.51 %
analysis-128-750000-1-1.pgn.zst128 MiB750,000110.0013523.53 %
analysis-1024-400000-inf-0.pgn.zst1024 MiB400,00000.0015923.90 %
analysis-1024-300000-inf-0.pgn.zst1024 MiB300,00000.0017224.69 %
analysis-64-1500000-1-0.pgn.zst64 MiB1,500,000100.0012325.56 %
analysis-128-1500000-1-0.pgn.zst128 MiB1,500,000100.0012525.57 %
analysis-32-1500000-1-0.pgn.zst32 MiB1,500,000100.0012625.58 %
analysis-4-1500000-1-0.pgn.zst4 MiB1,500,000100.0012525.59 %
analysis-256-1500000-1-0.pgn.zst256 MiB1,500,000100.0012325.60 %
analysis-512-1500000-1-0.pgn.zst512 MiB1,500,000100.0012325.62 %
analysis-16-1500000-1-0.pgn.zst16 MiB1,500,000100.0012225.63 %
analysis-1024-1500000-1-0.pgn.zst1024 MiB1,500,000100.0012425.68 %
analysis-8-1500000-1-0.pgn.zst8 MiB1,500,000100.0012225.72 %
analysis-2-1500000-1-0.pgn.zst2 MiB1,500,000100.0012325.75 %
analysis-1-1500000-1-0.pgn.zst1 MiB1,500,000100.0012525.87 %
analysis-1024-200000-inf-0.pgn.zst1024 MiB200,00000.0019225.95 %
analysis-1024-100000-inf-0.pgn.zst1024 MiB100,00000.0023627.75 %
analysis-1024-10000-inf-0.pgn.zst1024 MiB10,00000.0043034.64 %
analysis-1024-1000-inf-0.pgn.zst1024 MiB1,00000.0087043.37 %
analysis-1024-100-inf-0.pgn.zst1024 MiB10000.0171450.66 %
analysis-1024-10-inf-0.pgn.zst1024 MiB1000.0212951.35 %
analysis-1024-1-inf-0.pgn.zst1024 MiB100.0216051.59 %

niklasf, 31th July 2023.

Update: Quality gains due to chunking appear to be explained almost entirely by fixing simple inconsistencies where the best move improves the evaluation. Thanks _David_.

Update: fishnet v2.8.1 introduces chunking with size 5 and overlap 1.