Comparing the quality of backwards analysis strategies

Lichess aims to provide deterministic, strong, and fast chess game analysis. There is some tension between these goals.

Analysis is provided by volunteers, running the fishnet client (as version 2.7.0 running Stockfish 16) on a wide variety of hardware. Determinism ensures users can expect consistent quality, and bugs or manipulation can be reliably identified.

Trying to optimize for quality while keeping computing power fixed, it is known that backwards analysis is efficient. When analysing the game backwards, move by move, the chess engine's hash table (also known as transposition table) will often contain highly relevant information about what happened later in the game.

However, only single-threaded analysis is deterministic. So on one extreme side, if we want to fully utilize the hash table, no other threads can contribute to quickly finishing the particular game. On the other extreme side, fishnet 2.7.0 analyses individual positions in parallel, each with a pristine hash table, finishing games very quickly.

More balanced strategies might be worth considering. To get some sharing and some parallelism, the game could be split into (possibly overlapping) chunks of consecutive positions.

What are we missing out on today, and how do all of these approaches stack up in terms of quality?

Method

For forward play, the strength of players can be evaluated directly by looking at the outcome of their games, and we expect strong engines to provide high quality analysis. For a tweak that weakens the engine, like clearing the hash table after each move, an "equivalent" tweak is also expected to lower the quality of backwards analysis.

However, it is not immediately clear if the hash table is perhaps more or less important in backwards analysis, or what the equivalent of overlap in backwards analysis would even be. So we look for a more direct way to evaluate the quality of backwards analysis. Thanks MoistvonLipwig, for the hints.

Here, 1000 games are randomly selected from rated games played on Lichess in June 2023 for which engine analysis had been requested.

The games are then re-analysed with Stockfish 16 at very high node limits, 100 meganodes for each position, using 1 GiB hash in single-threaded backwards analysis.

analysis-1024-100000000-inf-0.pgn.zst

The first 10 plies of each game are ignored for the following analysis. As a result, for each non-opening move, we now have an expected game theoretical outcome $E_i$ between $0$ and $1$, based on Stockfish's WDL model, and the recommended primary variation.

The node limits used for the strategies we want to evaluate are about two orders of magnitude lower. Hopefully, from that perspective it's reasonable to accept these evaluations as near-objective truth, as close to the true evaluation of each position ($0$, $\frac{1}{2}$, or $1$) and optimal play as we can expect to get.

To evaluate a strategy, it is used to analyse the games to obtain evaluations $\hat{E_i}$ and primary variations.

We score strategies by:

The mean squared error of the evaluations: $$ \textrm{MSE} = \frac{1}{n} \sum^{n}_{i=1}(E_i - \hat{E_i})^2 $$
The rate of mispredicted primary moves.

Experiments

All of the following experiments are performed with Stockfish 16 in single-threaded backwards analysis, and are thus exactly reproducible. UCI_AnalyseMode is always on. We vary the following dimensions:

Hash table size. The size of the transposition table (setoption name Hash).
Node limits. Analysis uses a fixed node limit per position (go nodes in the UCI protocol).
Chunk size. For a chunk size of $ n $, the last chunk of $ n $ positions is analysed backwards, followed by clearing the hash table, followed by analysing the next chunk of $ n $ preceding positions, and so on.
Overlap. For overlap size of $ n $, before analysing a chunk, the hash table is primed by analysing the following $ n $ positions first. Analysis of overlap positions is performed at the same node limit. This means more total resources spent. Additional experiments with adjusted node limits were performed.

Node limits

Hash table size

Chunk size

Discussion

Unsurprisingly, the ability to predict evaluations is basically the same as the ability to predict moves. The latter metric appears to be slightly less noisy.

The primary threat to validity is the choice of those metrics. If we go along with it, we see that:

fishnet 2.7.0 leaves a lot of quality on table.
Provided the hash table is used at all, the size is basically irrelevant at fishnet's relatively shallow node limit.
At equal resources used, chunking, even with overlap, can close the gap between fishnet 2.7.0 and sequential backwards-analysis.

Raw data

PGN	Hash	Nodes	Chunk size	Overlap	MSE	PV miss
`analysis-1024-100000000-inf-0.pgn.zst`	1024 MiB	100,000,000	∞	0	0.00000	0.00 %
`analysis-1024-10000000-inf-0.pgn.zst`	1024 MiB	10,000,000	∞	0	0.00065	16.29 %
`analysis-1024-2800000-inf-0.pgn.zst`	1024 MiB	2,800,000	∞	0	0.00092	19.13 %
`analysis-1024-2500000-inf-0.pgn.zst`	1024 MiB	2,500,000	∞	0	0.00095	19.30 %
`analysis-1024-2900000-inf-0.pgn.zst`	1024 MiB	2,900,000	∞	0	0.00093	19.31 %
`analysis-1024-3000000-inf-0.pgn.zst`	1024 MiB	3,000,000	∞	0	0.00095	19.34 %
`analysis-1024-2700000-inf-0.pgn.zst`	1024 MiB	2,700,000	∞	0	0.00098	19.34 %
`analysis-1024-2400000-inf-0.pgn.zst`	1024 MiB	2,400,000	∞	0	0.00097	19.47 %
`analysis-1024-2600000-inf-0.pgn.zst`	1024 MiB	2,600,000	∞	0	0.00096	19.48 %
`analysis-1024-2300000-inf-0.pgn.zst`	1024 MiB	2,300,000	∞	0	0.00099	19.57 %
`analysis-1024-2200000-inf-0.pgn.zst`	1024 MiB	2,200,000	∞	0	0.00102	19.57 %
`analysis-1024-2100000-inf-0.pgn.zst`	1024 MiB	2,100,000	∞	0	0.00103	19.95 %
`analysis-1024-2000000-inf-0.pgn.zst`	1024 MiB	2,000,000	∞	0	0.00103	19.96 %
`analysis-1024-1900000-inf-0.pgn.zst`	1024 MiB	1,900,000	∞	0	0.00105	20.22 %
`analysis-1024-1800000-inf-0.pgn.zst`	1024 MiB	1,800,000	∞	0	0.00108	20.23 %
`analysis-1024-1700000-inf-0.pgn.zst`	1024 MiB	1,700,000	∞	0	0.00108	20.39 %
`analysis-1024-1600000-inf-0.pgn.zst`	1024 MiB	1,600,000	∞	0	0.00108	20.50 %
`analysis-256-1500000-inf-0.pgn.zst`	256 MiB	1,500,000	∞	0	0.00109	20.60 %
`analysis-1024-1500000-inf-0.pgn.zst`	1024 MiB	1,500,000	∞	0	0.00108	20.69 %
`analysis-64-1500000-inf-0.pgn.zst`	64 MiB	1,500,000	∞	0	0.00111	20.69 %
`analysis-32-1500000-inf-0.pgn.zst`	32 MiB	1,500,000	∞	0	0.00111	20.71 %
`analysis-128-1500000-inf-0.pgn.zst`	128 MiB	1,500,000	∞	0	0.00112	20.73 %
`analysis-8-1500000-inf-0.pgn.zst`	8 MiB	1,500,000	∞	0	0.00110	20.73 %
`analysis-512-1500000-inf-0.pgn.zst`	512 MiB	1,500,000	∞	0	0.00110	20.77 %
`analysis-128-1500000-10-1.pgn.zst`	128 MiB	1,500,000	10	1	0.00113	20.83 %
`analysis-1024-1400000-inf-0.pgn.zst`	1024 MiB	1,400,000	∞	0	0.00111	20.83 %
`analysis-4-1500000-inf-0.pgn.zst`	4 MiB	1,500,000	∞	0	0.00114	20.90 %
`analysis-16-1500000-inf-0.pgn.zst`	16 MiB	1,500,000	∞	0	0.00110	20.96 %
`analysis-2-1500000-inf-0.pgn.zst`	2 MiB	1,500,000	∞	0	0.00112	21.02 %
`analysis-128-1500000-8-1.pgn.zst`	128 MiB	1,500,000	8	1	0.00111	21.02 %
`analysis-128-1500000-7-1.pgn.zst`	128 MiB	1,500,000	7	1	0.00112	21.03 %
`analysis-128-1500000-9-1.pgn.zst`	128 MiB	1,500,000	9	1	0.00110	21.04 %
`analysis-1-1500000-inf-0.pgn.zst`	1 MiB	1,500,000	∞	0	0.00112	21.05 %
`analysis-1024-1300000-inf-0.pgn.zst`	1024 MiB	1,300,000	∞	0	0.00114	21.07 %
`analysis-128-1500000-10-0.pgn.zst`	128 MiB	1,500,000	10	0	0.00110	21.12 %
`analysis-128-1500000-6-1.pgn.zst`	128 MiB	1,500,000	6	1	0.00110	21.21 %
`analysis-1024-1200000-inf-0.pgn.zst`	1024 MiB	1,200,000	∞	0	0.00116	21.21 %
`analysis-128-1333333-8-1.pgn.zst`	128 MiB	1,333,333	8	1	0.00116	21.23 %
`analysis-128-1500000-4-1.pgn.zst`	128 MiB	1,500,000	4	1	0.00112	21.30 %
`analysis-128-1500000-5-1.pgn.zst`	128 MiB	1,500,000	5	1	0.00111	21.35 %
`analysis-128-1500000-8-0.pgn.zst`	128 MiB	1,500,000	8	0	0.00112	21.38 %
`analysis-128-1350000-9-1.pgn.zst`	128 MiB	1,350,000	9	1	0.00115	21.38 %
`analysis-128-1312500-7-1.pgn.zst`	128 MiB	1,312,500	7	1	0.00113	21.42 %
`analysis-128-1285714-6-1.pgn.zst`	128 MiB	1,285,714	6	1	0.00114	21.46 %
`analysis-128-1363636-10-1.pgn.zst`	128 MiB	1,363,636	10	1	0.00112	21.51 %
`analysis-1024-1100000-inf-0.pgn.zst`	1024 MiB	1,100,000	∞	0	0.00118	21.51 %
`analysis-128-1500000-9-0.pgn.zst`	128 MiB	1,500,000	9	0	0.00114	21.55 %
`analysis-128-1200000-4-1.pgn.zst`	128 MiB	1,200,000	4	1	0.00117	21.64 %
`analysis-128-1500000-3-1.pgn.zst`	128 MiB	1,500,000	3	1	0.00110	21.65 %
`analysis-128-1500000-2-1.pgn.zst`	128 MiB	1,500,000	2	1	0.00113	21.65 %
`analysis-128-1250000-5-1.pgn.zst`	128 MiB	1,250,000	5	1	0.00117	21.71 %
`analysis-128-1500000-6-0.pgn.zst`	128 MiB	1,500,000	6	0	0.00113	21.74 %
`analysis-1024-1000000-inf-0.pgn.zst`	1024 MiB	1,000,000	∞	0	0.00120	21.79 %
`analysis-128-1500000-7-0.pgn.zst`	128 MiB	1,500,000	7	0	0.00107	21.79 %
`analysis-1024-900000-inf-0.pgn.zst`	1024 MiB	900,000	∞	0	0.00119	22.03 %
`analysis-128-1500000-1-1.pgn.zst`	128 MiB	1,500,000	1	1	0.00113	22.13 %
`analysis-128-1500000-5-0.pgn.zst`	128 MiB	1,500,000	5	0	0.00112	22.18 %
`analysis-128-1125000-3-1.pgn.zst`	128 MiB	1,125,000	3	1	0.00117	22.26 %
`analysis-1024-800000-inf-0.pgn.zst`	1024 MiB	800,000	∞	0	0.00127	22.31 %
`analysis-128-1500000-4-0.pgn.zst`	128 MiB	1,500,000	4	0	0.00114	22.34 %
`analysis-128-1000000-2-1.pgn.zst`	128 MiB	1,000,000	2	1	0.00124	22.57 %
`analysis-1024-700000-inf-0.pgn.zst`	1024 MiB	700,000	∞	0	0.00133	22.59 %
`analysis-1024-600000-inf-0.pgn.zst`	1024 MiB	600,000	∞	0	0.00144	22.91 %
`analysis-128-1500000-3-0.pgn.zst`	128 MiB	1,500,000	3	0	0.00117	22.95 %
`analysis-128-1500000-2-0.pgn.zst`	128 MiB	1,500,000	2	0	0.00120	23.33 %
`analysis-1024-500000-inf-0.pgn.zst`	1024 MiB	500,000	∞	0	0.00146	23.51 %
`analysis-128-750000-1-1.pgn.zst`	128 MiB	750,000	1	1	0.00135	23.53 %
`analysis-1024-400000-inf-0.pgn.zst`	1024 MiB	400,000	∞	0	0.00159	23.90 %
`analysis-1024-300000-inf-0.pgn.zst`	1024 MiB	300,000	∞	0	0.00172	24.69 %
`analysis-64-1500000-1-0.pgn.zst`	64 MiB	1,500,000	1	0	0.00123	25.56 %
`analysis-128-1500000-1-0.pgn.zst`	128 MiB	1,500,000	1	0	0.00125	25.57 %
`analysis-32-1500000-1-0.pgn.zst`	32 MiB	1,500,000	1	0	0.00126	25.58 %
`analysis-4-1500000-1-0.pgn.zst`	4 MiB	1,500,000	1	0	0.00125	25.59 %
`analysis-256-1500000-1-0.pgn.zst`	256 MiB	1,500,000	1	0	0.00123	25.60 %
`analysis-512-1500000-1-0.pgn.zst`	512 MiB	1,500,000	1	0	0.00123	25.62 %
`analysis-16-1500000-1-0.pgn.zst`	16 MiB	1,500,000	1	0	0.00122	25.63 %
`analysis-1024-1500000-1-0.pgn.zst`	1024 MiB	1,500,000	1	0	0.00124	25.68 %
`analysis-8-1500000-1-0.pgn.zst`	8 MiB	1,500,000	1	0	0.00122	25.72 %
`analysis-2-1500000-1-0.pgn.zst`	2 MiB	1,500,000	1	0	0.00123	25.75 %
`analysis-1-1500000-1-0.pgn.zst`	1 MiB	1,500,000	1	0	0.00125	25.87 %
`analysis-1024-200000-inf-0.pgn.zst`	1024 MiB	200,000	∞	0	0.00192	25.95 %
`analysis-1024-100000-inf-0.pgn.zst`	1024 MiB	100,000	∞	0	0.00236	27.75 %
`analysis-1024-10000-inf-0.pgn.zst`	1024 MiB	10,000	∞	0	0.00430	34.64 %
`analysis-1024-1000-inf-0.pgn.zst`	1024 MiB	1,000	∞	0	0.00870	43.37 %
`analysis-1024-100-inf-0.pgn.zst`	1024 MiB	100	∞	0	0.01714	50.66 %
`analysis-1024-10-inf-0.pgn.zst`	1024 MiB	10	∞	0	0.02129	51.35 %
`analysis-1024-1-inf-0.pgn.zst`	1024 MiB	1	∞	0	0.02160	51.59 %

niklasf, 31th July 2023.

Update: Quality gains due to chunking appear to be explained almost entirely by fixing simple inconsistencies where the best move improves the evaluation. Thanks _David_.

Update: fishnet v2.8.1 introduces chunking with size 5 and overlap 1.