Building a Data-Driven Chess Opening Repertoire + FREE TEST REPERTOIRE AGAINST 1. e4

Jan 25, 2025

Today, we'll explore a new approach that combines database statistics and engine evaluation to build an opening repertoire. I may have overcomplicated it, but I will let you be the judge.

The Idea

Last year I attempted to make a small program that could make an opening pgn-file created and annotated by ChatGPT. Overall, I concluded that AI’s ability to interpret and analyze chess positions isn’t good enough (yet) to continue with this idea.

However, the program's code backbone was based on the Lichess database and Stockfish, so I had the idea to investigate if I could improve on earlier attempts to autogenerate opening files. The other previous attempts I have seen have either relied on Lichess database win% data or pure Stockfish analysis for move selection, but what if we rely on all available data and use weights to distribute the uncertainty of whether a specific opening branch will lead to a winning position or not?

Think of playing a chess opening as climbing a massive tree, where each move you make is like choosing which branch to climb onto next. At the base of the tree, you've got your main lines—like the trunk with several main branches splitting off. The Stockfish engine acts as your safety inspector, telling you which branches are structurally sound from an objective standpoint. Meanwhile, the Lichess database is like having a map showing which paths others at your skill level typically take, complete with notes about how successful their climbs were, and who fell..

As you move higher up the tree, each branch represents a possible position you might only see once in a while, and each fork is a decision point. Some branches might look safe to the engine (Stockfish) but represent great practical difficulty because many players at your level struggle to find the correct path toward the top of the tree at that specific point.

So the question is how do we combine all the information in a useful and meaningful way?

The Data

When choosing an opening, many chess players rely on intuition, gravitating toward positions that resonate with them. In contrast, my approach takes the opposite route, aiming to construct a data-driven selection engine to determine the optimal outcome based on data.

The main components of my approach have been to use the following data sources:

Stockfish
Lichess Players Database win% sorted by time control and rating range
Position Complexity Score (more on that)

Stockfish will simply give an objective evaluation of the given opening position. The downside of relying solely on stockfish to pick opening lines is that engine lines do not always make logical sense to humans.
The second data point I used for my model is the Lichess players database, which has around 3.6 million games. What is amazing is that you can set a specific rating range and time control.

What is amazing is that you can set a specific rating range and time control. With this, we can find the most likely moves we will face in response to our opening moves and how often that specific position results in a win.

The last element I have used is something I have called ‘Position Complexity Score’. Maybe someone has gotten this idea before me and called it something else, but that is not so important. The key idea is that after each move we make, there is a statistical chance our opponents will play a suboptimal move or even a blunder.

Position complexity score simply measures this likelihood by looking at the most common responses to our move, evaluates these responses with Stockfish, and then gives the weighted centipawn loss for each move compared to the best engine move in the position.

This can then be averaged and we have a score that represents how often our opponents go wrong. Here is an example where Black is to move.

Below you can see the analysis of the top engine move Ncxb3. The analysis shows that after this move White players (rated 1600-1800 Lichess rapid/classical) play axb3 22.9 % of the time. This increases the advantage for Black by 1.63 due to the fork on c2!

This results in a high complexity score for this given position for the particular rating range. This is then weighed compared to how common the move is and turned into the complexity score.

Analyzing position for user move:
FEN: r1bqkb1r/ppp2ppp/8/2npN3/3n4/1B1PQ3/PPP2PPP/RNB1K2R b KQkq - 2 8
Side to move (our side) is Black

Engine candidates (White-perspective eval):
  1. Ncxb3 (eval: -5.13, normalized: 1.00)
  2. Nf5 (eval: -1.54, normalized: 1.00)
  3. Nce6 (eval: -1.01, normalized: 0.84)

    Analyzing opponent responses (Black repertoire):
      Critical response eval: -7.33 (White-persp)
      Weighted average eval (White-persp): -5.70

      Move analysis (by frequency):
      Move       Played       Eval       Loss   Weighted
      ----------------------------------------------
      Nc6+        64.1%     -5.12     +0.00     +0.00
      axb3        22.9%     -7.33     +1.63     +0.37

      Average weighted loss: +0.43
      Complexity score: 0.43

Analyzing move Ncxb3:
  Raw engine eval (White-perspective): -5.16
  Normalized engine score for Black: 1.00
  Bayesian DB win rate: 91.9% -> 1.00
  Opponent response complexity: 0.43
  Weighted contributions (total games=694):
    Engine (91.0%): 1.00 -> 0.91
    DB win  (4.9%): 1.00 -> 0.05
    Response(4.2%): 0.43 -> 0.02
  => Combined score: 0.98

You might notice that the engine, DB win, and complexity score (response) do not get the same weight. That is because I included a formula to distribute the weights based on the database size, so we place more confidence in the database and complexity score when there is a large sample size and larger confidence in Stockfish when the sample size is small.

Pruning and Refining the Repertoire

The repertoire is constructed and pruned with a scoring system and budget constraints, that should ensure that only the most promising lines are retained. The pruning process relies on a hierarchical evaluation of potential moves at each decision point.

This includes:

Priority Scoring: Each position in the repertoire tree is assigned a priority score based on its database popularity and the total number of games and their evaluation. Scaling is applied to ensure positions more likely to show up on the board are appropriately weighted, so we focus on positions that are equal or slightly advantageous to the opponent.
Budget Management: A predefined line budget limits the total number of complete variations generated. As each line is completed, it counts against this budget. Once the budget is exhausted, the algorithm ceases further exploration, ensuring the repertoire remains concise and manageable.
Iterative Refinement: The process iteratively explores high-priority deviations. Positions with significant unexplored potential—identified during opponent response analysis—are added as new branches. However, if their calculated importance fails to meet the minimum priority score, they are discarded, focusing resources on the most critical areas of the repertoire.

How does the complexity score affect the move selection?

In this line after 1. e4 e5 2. Bc4 Nf6 3. d3 The most common move is Bc5. However, at the 1600-1800 level many White players fall into a trap after 3…c6 4. Nf3 Be7!

In 45% of the games, White plays Nxe5?, which fails due to Qa5+! The top engine move is not 4…Be7, but 4…d5. After these moves follow the most common 1600-1800 range line of play: 5. exd5 cxd5 6. Bb5+ Bd7 7. Bxd7+ Nbxd7 and Black did not create the chances for making the opponent blunder. So, by playing 4…Be7 we optimized our winning chances significantly by following the stats rather than relying solely on the engine.

Purists will of course tell you here that it is better to learn the principles rather than relying on tricks.

A 22-line Opening Repertoire Against 1. e4

OPENING_MOVES = "1. e4"
REPERTOIRE_SIDE = 'Black'

GAME_SPEEDS = ["classical", "rapid"]
RATING_RANGES = [1600, 1800]

I have made a small test of the generator that you can review. It turns out that according to the output, that the Petroff is the most effective weapon against 1. e4 for the 1600-1800 rating range.

Study

Say Chess

Discussion about this post