Comparing the quality of backwards analysis strategies
Lichess aims to provide deterministic, strong, and fast chess game analysis. There is some tension between these goals.
Analysis is provided by volunteers, running the fishnet client (as version 2.7.0 running Stockfish 16) on a wide variety of hardware. Determinism ensures users can expect consistent quality, and bugs or manipulation can be reliably identified.
Trying to optimize for quality while keeping computing power fixed, it is known that backwards analysis is efficient. When analysing the game backwards, move by move, the chess engine's hash table (also known as transposition table) will often contain highly relevant information about what happened later in the game.
However, only single-threaded analysis is deterministic. So on one extreme side, if we want to fully utilize the hash table, no other threads can contribute to quickly finishing the particular game. On the other extreme side, fishnet 2.7.0 analyses individual positions in parallel, each with a pristine hash table, finishing games very quickly.
More balanced strategies might be worth considering. To get some sharing and some parallelism, the game could be split into (possibly overlapping) chunks of consecutive positions.
What are we missing out on today, and how do all of these approaches stack up in terms of quality?
Method
For forward play, the strength of players can be evaluated directly by looking at the outcome of their games, and we expect strong engines to provide high quality analysis. For a tweak that weakens the engine, like clearing the hash table after each move, an "equivalent" tweak is also expected to lower the quality of backwards analysis.
However, it is not immediately clear if the hash table is perhaps more or less important in backwards analysis, or what the equivalent of overlap in backwards analysis would even be. So we look for a more direct way to evaluate the quality of backwards analysis. Thanks MoistvonLipwig, for the hints.
Here, 1000 games are randomly selected from rated games played on Lichess in June 2023 for which engine analysis had been requested.
The games are then re-analysed with Stockfish 16 at very high node limits, 100 meganodes for each position, using 1 GiB hash in single-threaded backwards analysis.
analysis-1024-100000000-inf-0.pgn.zst
The first 10 plies of each game are ignored for the following analysis. As a result, for each non-opening move, we now have an expected game theoretical outcome \(E_i\) between \(0\) and \(1\), based on Stockfish's WDL model, and the recommended primary variation.
The node limits used for the strategies we want to evaluate are about two orders of magnitude lower. Hopefully, from that perspective it's reasonable to accept these evaluations as near-objective truth, as close to the true evaluation of each position (\(0\), \(\frac{1}{2}\), or \(1\)) and optimal play as we can expect to get.
To evaluate a strategy, it is used to analyse the games to obtain evaluations \(\hat{E_i}\) and primary variations.
We score strategies by:
- The mean squared error of the evaluations: $$ \textrm{MSE} = \frac{1}{n} \sum^{n}_{i=1}(E_i - \hat{E_i})^2 $$
- The rate of mispredicted primary moves.
Experiments
All of the following experiments are performed with Stockfish 16 in
single-threaded backwards analysis, and are thus exactly reproducible.
UCI_AnalyseMode
is always on.
We vary the following dimensions:
- Hash table size. The size of the transposition table
(
setoption name Hash
). - Node limits. Analysis uses a fixed node limit per position
(
go nodes
in the UCI protocol). - Chunk size. For a chunk size of \( n \), the last chunk of \( n \) positions is analysed backwards, followed by clearing the hash table, followed by analysing the next chunk of \( n \) preceding positions, and so on.
- Overlap. For overlap size of \( n \), before analysing a chunk, the hash table is primed by analysing the following \( n \) positions first. Analysis of overlap positions is performed at the same node limit. This means more total resources spent. Additional experiments with adjusted node limits were performed.
Node limits
Hash table size
Chunk size
Discussion
Unsurprisingly, the ability to predict evaluations is basically the same as the ability to predict moves. The latter metric appears to be slightly less noisy.
The primary threat to validity is the choice of those metrics. If we go along with it, we see that:
- fishnet 2.7.0 leaves a lot of quality on table.
- Provided the hash table is used at all, the size is basically irrelevant at fishnet's relatively shallow node limit.
- At equal resources used, chunking, even with overlap, can close the gap between fishnet 2.7.0 and sequential backwards-analysis.
Raw data
PGN | Hash | Nodes | Chunk size | Overlap | MSE | PV miss |
---|---|---|---|---|---|---|
analysis-1024-100000000-inf-0.pgn.zst | 1024 MiB | 100,000,000 | ∞ | 0 | 0.00000 | 0.00 % |
analysis-1024-10000000-inf-0.pgn.zst | 1024 MiB | 10,000,000 | ∞ | 0 | 0.00065 | 16.29 % |
analysis-1024-2800000-inf-0.pgn.zst | 1024 MiB | 2,800,000 | ∞ | 0 | 0.00092 | 19.13 % |
analysis-1024-2500000-inf-0.pgn.zst | 1024 MiB | 2,500,000 | ∞ | 0 | 0.00095 | 19.30 % |
analysis-1024-2900000-inf-0.pgn.zst | 1024 MiB | 2,900,000 | ∞ | 0 | 0.00093 | 19.31 % |
analysis-1024-3000000-inf-0.pgn.zst | 1024 MiB | 3,000,000 | ∞ | 0 | 0.00095 | 19.34 % |
analysis-1024-2700000-inf-0.pgn.zst | 1024 MiB | 2,700,000 | ∞ | 0 | 0.00098 | 19.34 % |
analysis-1024-2400000-inf-0.pgn.zst | 1024 MiB | 2,400,000 | ∞ | 0 | 0.00097 | 19.47 % |
analysis-1024-2600000-inf-0.pgn.zst | 1024 MiB | 2,600,000 | ∞ | 0 | 0.00096 | 19.48 % |
analysis-1024-2300000-inf-0.pgn.zst | 1024 MiB | 2,300,000 | ∞ | 0 | 0.00099 | 19.57 % |
analysis-1024-2200000-inf-0.pgn.zst | 1024 MiB | 2,200,000 | ∞ | 0 | 0.00102 | 19.57 % |
analysis-1024-2100000-inf-0.pgn.zst | 1024 MiB | 2,100,000 | ∞ | 0 | 0.00103 | 19.95 % |
analysis-1024-2000000-inf-0.pgn.zst | 1024 MiB | 2,000,000 | ∞ | 0 | 0.00103 | 19.96 % |
analysis-1024-1900000-inf-0.pgn.zst | 1024 MiB | 1,900,000 | ∞ | 0 | 0.00105 | 20.22 % |
analysis-1024-1800000-inf-0.pgn.zst | 1024 MiB | 1,800,000 | ∞ | 0 | 0.00108 | 20.23 % |
analysis-1024-1700000-inf-0.pgn.zst | 1024 MiB | 1,700,000 | ∞ | 0 | 0.00108 | 20.39 % |
analysis-1024-1600000-inf-0.pgn.zst | 1024 MiB | 1,600,000 | ∞ | 0 | 0.00108 | 20.50 % |
analysis-256-1500000-inf-0.pgn.zst | 256 MiB | 1,500,000 | ∞ | 0 | 0.00109 | 20.60 % |
analysis-1024-1500000-inf-0.pgn.zst | 1024 MiB | 1,500,000 | ∞ | 0 | 0.00108 | 20.69 % |
analysis-64-1500000-inf-0.pgn.zst | 64 MiB | 1,500,000 | ∞ | 0 | 0.00111 | 20.69 % |
analysis-32-1500000-inf-0.pgn.zst | 32 MiB | 1,500,000 | ∞ | 0 | 0.00111 | 20.71 % |
analysis-128-1500000-inf-0.pgn.zst | 128 MiB | 1,500,000 | ∞ | 0 | 0.00112 | 20.73 % |
analysis-8-1500000-inf-0.pgn.zst | 8 MiB | 1,500,000 | ∞ | 0 | 0.00110 | 20.73 % |
analysis-512-1500000-inf-0.pgn.zst | 512 MiB | 1,500,000 | ∞ | 0 | 0.00110 | 20.77 % |
analysis-128-1500000-10-1.pgn.zst | 128 MiB | 1,500,000 | 10 | 1 | 0.00113 | 20.83 % |
analysis-1024-1400000-inf-0.pgn.zst | 1024 MiB | 1,400,000 | ∞ | 0 | 0.00111 | 20.83 % |
analysis-4-1500000-inf-0.pgn.zst | 4 MiB | 1,500,000 | ∞ | 0 | 0.00114 | 20.90 % |
analysis-16-1500000-inf-0.pgn.zst | 16 MiB | 1,500,000 | ∞ | 0 | 0.00110 | 20.96 % |
analysis-2-1500000-inf-0.pgn.zst | 2 MiB | 1,500,000 | ∞ | 0 | 0.00112 | 21.02 % |
analysis-128-1500000-8-1.pgn.zst | 128 MiB | 1,500,000 | 8 | 1 | 0.00111 | 21.02 % |
analysis-128-1500000-7-1.pgn.zst | 128 MiB | 1,500,000 | 7 | 1 | 0.00112 | 21.03 % |
analysis-128-1500000-9-1.pgn.zst | 128 MiB | 1,500,000 | 9 | 1 | 0.00110 | 21.04 % |
analysis-1-1500000-inf-0.pgn.zst | 1 MiB | 1,500,000 | ∞ | 0 | 0.00112 | 21.05 % |
analysis-1024-1300000-inf-0.pgn.zst | 1024 MiB | 1,300,000 | ∞ | 0 | 0.00114 | 21.07 % |
analysis-128-1500000-10-0.pgn.zst | 128 MiB | 1,500,000 | 10 | 0 | 0.00110 | 21.12 % |
analysis-128-1500000-6-1.pgn.zst | 128 MiB | 1,500,000 | 6 | 1 | 0.00110 | 21.21 % |
analysis-1024-1200000-inf-0.pgn.zst | 1024 MiB | 1,200,000 | ∞ | 0 | 0.00116 | 21.21 % |
analysis-128-1333333-8-1.pgn.zst | 128 MiB | 1,333,333 | 8 | 1 | 0.00116 | 21.23 % |
analysis-128-1500000-4-1.pgn.zst | 128 MiB | 1,500,000 | 4 | 1 | 0.00112 | 21.30 % |
analysis-128-1500000-5-1.pgn.zst | 128 MiB | 1,500,000 | 5 | 1 | 0.00111 | 21.35 % |
analysis-128-1500000-8-0.pgn.zst | 128 MiB | 1,500,000 | 8 | 0 | 0.00112 | 21.38 % |
analysis-128-1350000-9-1.pgn.zst | 128 MiB | 1,350,000 | 9 | 1 | 0.00115 | 21.38 % |
analysis-128-1312500-7-1.pgn.zst | 128 MiB | 1,312,500 | 7 | 1 | 0.00113 | 21.42 % |
analysis-128-1285714-6-1.pgn.zst | 128 MiB | 1,285,714 | 6 | 1 | 0.00114 | 21.46 % |
analysis-128-1363636-10-1.pgn.zst | 128 MiB | 1,363,636 | 10 | 1 | 0.00112 | 21.51 % |
analysis-1024-1100000-inf-0.pgn.zst | 1024 MiB | 1,100,000 | ∞ | 0 | 0.00118 | 21.51 % |
analysis-128-1500000-9-0.pgn.zst | 128 MiB | 1,500,000 | 9 | 0 | 0.00114 | 21.55 % |
analysis-128-1200000-4-1.pgn.zst | 128 MiB | 1,200,000 | 4 | 1 | 0.00117 | 21.64 % |
analysis-128-1500000-3-1.pgn.zst | 128 MiB | 1,500,000 | 3 | 1 | 0.00110 | 21.65 % |
analysis-128-1500000-2-1.pgn.zst | 128 MiB | 1,500,000 | 2 | 1 | 0.00113 | 21.65 % |
analysis-128-1250000-5-1.pgn.zst | 128 MiB | 1,250,000 | 5 | 1 | 0.00117 | 21.71 % |
analysis-128-1500000-6-0.pgn.zst | 128 MiB | 1,500,000 | 6 | 0 | 0.00113 | 21.74 % |
analysis-1024-1000000-inf-0.pgn.zst | 1024 MiB | 1,000,000 | ∞ | 0 | 0.00120 | 21.79 % |
analysis-128-1500000-7-0.pgn.zst | 128 MiB | 1,500,000 | 7 | 0 | 0.00107 | 21.79 % |
analysis-1024-900000-inf-0.pgn.zst | 1024 MiB | 900,000 | ∞ | 0 | 0.00119 | 22.03 % |
analysis-128-1500000-1-1.pgn.zst | 128 MiB | 1,500,000 | 1 | 1 | 0.00113 | 22.13 % |
analysis-128-1500000-5-0.pgn.zst | 128 MiB | 1,500,000 | 5 | 0 | 0.00112 | 22.18 % |
analysis-128-1125000-3-1.pgn.zst | 128 MiB | 1,125,000 | 3 | 1 | 0.00117 | 22.26 % |
analysis-1024-800000-inf-0.pgn.zst | 1024 MiB | 800,000 | ∞ | 0 | 0.00127 | 22.31 % |
analysis-128-1500000-4-0.pgn.zst | 128 MiB | 1,500,000 | 4 | 0 | 0.00114 | 22.34 % |
analysis-128-1000000-2-1.pgn.zst | 128 MiB | 1,000,000 | 2 | 1 | 0.00124 | 22.57 % |
analysis-1024-700000-inf-0.pgn.zst | 1024 MiB | 700,000 | ∞ | 0 | 0.00133 | 22.59 % |
analysis-1024-600000-inf-0.pgn.zst | 1024 MiB | 600,000 | ∞ | 0 | 0.00144 | 22.91 % |
analysis-128-1500000-3-0.pgn.zst | 128 MiB | 1,500,000 | 3 | 0 | 0.00117 | 22.95 % |
analysis-128-1500000-2-0.pgn.zst | 128 MiB | 1,500,000 | 2 | 0 | 0.00120 | 23.33 % |
analysis-1024-500000-inf-0.pgn.zst | 1024 MiB | 500,000 | ∞ | 0 | 0.00146 | 23.51 % |
analysis-128-750000-1-1.pgn.zst | 128 MiB | 750,000 | 1 | 1 | 0.00135 | 23.53 % |
analysis-1024-400000-inf-0.pgn.zst | 1024 MiB | 400,000 | ∞ | 0 | 0.00159 | 23.90 % |
analysis-1024-300000-inf-0.pgn.zst | 1024 MiB | 300,000 | ∞ | 0 | 0.00172 | 24.69 % |
analysis-64-1500000-1-0.pgn.zst | 64 MiB | 1,500,000 | 1 | 0 | 0.00123 | 25.56 % |
analysis-128-1500000-1-0.pgn.zst | 128 MiB | 1,500,000 | 1 | 0 | 0.00125 | 25.57 % |
analysis-32-1500000-1-0.pgn.zst | 32 MiB | 1,500,000 | 1 | 0 | 0.00126 | 25.58 % |
analysis-4-1500000-1-0.pgn.zst | 4 MiB | 1,500,000 | 1 | 0 | 0.00125 | 25.59 % |
analysis-256-1500000-1-0.pgn.zst | 256 MiB | 1,500,000 | 1 | 0 | 0.00123 | 25.60 % |
analysis-512-1500000-1-0.pgn.zst | 512 MiB | 1,500,000 | 1 | 0 | 0.00123 | 25.62 % |
analysis-16-1500000-1-0.pgn.zst | 16 MiB | 1,500,000 | 1 | 0 | 0.00122 | 25.63 % |
analysis-1024-1500000-1-0.pgn.zst | 1024 MiB | 1,500,000 | 1 | 0 | 0.00124 | 25.68 % |
analysis-8-1500000-1-0.pgn.zst | 8 MiB | 1,500,000 | 1 | 0 | 0.00122 | 25.72 % |
analysis-2-1500000-1-0.pgn.zst | 2 MiB | 1,500,000 | 1 | 0 | 0.00123 | 25.75 % |
analysis-1-1500000-1-0.pgn.zst | 1 MiB | 1,500,000 | 1 | 0 | 0.00125 | 25.87 % |
analysis-1024-200000-inf-0.pgn.zst | 1024 MiB | 200,000 | ∞ | 0 | 0.00192 | 25.95 % |
analysis-1024-100000-inf-0.pgn.zst | 1024 MiB | 100,000 | ∞ | 0 | 0.00236 | 27.75 % |
analysis-1024-10000-inf-0.pgn.zst | 1024 MiB | 10,000 | ∞ | 0 | 0.00430 | 34.64 % |
analysis-1024-1000-inf-0.pgn.zst | 1024 MiB | 1,000 | ∞ | 0 | 0.00870 | 43.37 % |
analysis-1024-100-inf-0.pgn.zst | 1024 MiB | 100 | ∞ | 0 | 0.01714 | 50.66 % |
analysis-1024-10-inf-0.pgn.zst | 1024 MiB | 10 | ∞ | 0 | 0.02129 | 51.35 % |
analysis-1024-1-inf-0.pgn.zst | 1024 MiB | 1 | ∞ | 0 | 0.02160 | 51.59 % |
niklasf, 31th July 2023.
Update: Quality gains due to chunking appear to be explained almost entirely by fixing simple inconsistencies where the best move improves the evaluation. Thanks _David_.
Update: fishnet v2.8.1 introduces chunking with size 5 and overlap 1.