Are Chess Improvers Causing a Lichess Tactic Rating Deflation?

Sep 14, 2023

Welcome to Say Chess! This newsletter goes out to +3,800 chess players. If you haven't joined yet, sign up now and get the ebook '100 Headachingly Hard Mate In Two Puzzles Composed By Sam Loyd' for free.

Also, consider becoming a paid subscriber if you enjoy receiving these emails.

In my work on compiling the puzzles for the next Tactics Ladder book, I have noticed something. It seems like the puzzle ratings are lower online than in the Lichess puzzle data that I downloaded in December 2021 (it is not clearly specified if the actual data is older). So I got curious and decided to do a little data collecting to investigate if I was correct.

Each puzzle on Lichess has a unique puzzle ID. You can see it in the link-structure when you solve puzzles:

lichess.org/training/S1k6e (the Puzzle ID)

This particular puzzle is one of the most solved puzzles on Lichess. It has been solved or attempted to be solved over 220,000 times. In my data from 2021 it had a rating of 1515, but today it has dropped to 899! A 616 drop.

Try puzzle S1k6e

Such a huge drop is of course an outlier, and maybe the Boden’s mate has just reached a tactical meme status?

To investigate the data I made two samples of data using puzzle ID’s from my 2021 dataset. One set with the most solved puzzles overall and one set with the most solved puzzles in each 100-point rating band from 1000-2400. Each dataset had more than 1000 puzzles.

I then constructed a script that calls the Lichess API with the puzzle IDs from the dataset and collected the current rating of the puzzle matching the ID. This gave an output looking like this:

PuzzleID,2021-rating,online-rating,rating change,game phase
0575T,1261,1284,23,endgame 
T3ef9,1258,1288,30,endgame 
0ABXA,1659,1714,55,middlegame 
04BH8,1424,1395,-29,middlegame 
01jKd,1749,1879,130,endgame 
0ARDw,1149,1192,43,opening 
04z7E,1184,1117,-67,opening 
0HdXY,1459,1486,27,endgame 
3urVs,2192,2245,53,endgame 
VWLmC,1962,1992,30,endgame 
Og1lv,2061,2077,16,endgame 
7IOmg,1845,1860,15,middlegame
...

Here is the plot of the most solved puzzles.

As you can see most of them are in the 1400-1600 rating range. We can also see that most of the dots are concentrated below 0, and the trendline is tilting down the more difficult the puzzles get. It was after seeing this concentrated plot I decided to make another dataset where the puzzles were divided into bands.

If we do that we get this result.

Again there is a tendency that shows that the puzzles have dropped more points the longer we move up in puzzle rating. I then thought about whether the game state of the puzzle had an effect on the drop, so I subdivided the puzzles into opening, middlegame, and endgame puzzles.

It seems like the opening puzzles have seen the largest drops in rating difficulty, but other than that I’m unsure if we can conclude too much from this plot.

Finally, let us look at the average rating change by band:

This shows us that the lower rating bands actually saw a slight increase, while the largest drops were concentrated around the 1500-1600 rating range. So if you in December 2021 had a 1550 puzzle rating and did not do anything until today we could expect that you would drop your puzzle rating to around 1425.

What Can We Learn From the Changes in Puzzle Ratings?

1000-1200 Band: The ratings in this range have increased. While this could be interpreted in various ways, one possibility is that many new players are still grappling with foundational tactics.
Decline in Higher Bands: There's a notable decrease in puzzle ratings from the 1400-1600 to 2200-2400 bands. This could suggest a variety of things, including the possibility that more seasoned players are getting better at solving these particular puzzles.
Mild Decline in 1200-1400: There's a slight drop in this range. While the reasons for this aren't clear-cut, it could indicate players just past the beginner stage are gradually improving.

Points of Consideration

External Influences: Multiple factors can skew the data:
- Lichess might have altered its puzzle system.
- A shift in the main platform used by the chess community (e.g., from chess.com to Lichess).
- Instances of cheating, which might be more prevalent in tactics since some don't view it as serious as cheating in actual games. This could particularly depress ratings for harder puzzles.
Growing Player Base: The online chess community has expanded, spurred by "The Queen's Gambit" and the pandemic. These new entrants might be affecting puzzle ratings, though the exact impact is uncertain.

In light of the above, while the data provides some interesting patterns and tendencies, it's important to approach interpretations with some caution.

The data can hint at a potential rise of online chess improvers that might have contributed to pushing the ratings down, but we don’t know for sure.

One clear takeaway is that you should not obsess over your rating. While a player might improve their puzzle-solving skills, broader factors influencing the rating system can still lead to a drop in their actual rating. In essence, the rating system is a magic box and you do not control most of the variables.

I welcome your perspective on these results and observations.

Leave a comment

Thank you for reading Say Chess. This post is public so feel free to share it.

/Martin

John

A lot of people use the puzzle dashboard. Bowens mate is on there. If all your doing is Boden mates after the first 15 or 20 you will knock out the next 80 correctly even if your only 1200 rated. The pattern is easy to solve and see even after a short amount of practice.

Any of the mates on there I am sure got really depressed ratings just due the fact you can learn that pattern and do hundreds for practice and your puzzle rating has nothing to do with it.

The puzzle rating of anyone on lichess means nothing due to that dashboard. If I am only doing endgame puzzles I might run a rating of 1500, mate puzzles I can easily get up to 2400. Normal random stuff and I'm somewhere between 2000 and 2200.

The fact there are so many areas and types of puzzles as well as the 5 strength settings makes the puzzle rating of anyone on there mean nothing next to anyone elses.

Expand full comment

1 reply by Martin B. Justesen

kiwiPete

Sep 14, 2023Edited

I wonder if using lichess puzzles is the best idea for your book? Certainly I don't have trust that lichess puzzle ratings are particularly useful.

If someone were writing a puzzle book for me, here's what I'd like them to include & do:

Type 1 puzzles, things missed by my rating range:

- extract puzzles from the games of players of my rating, say within +/- 100 points, at whatever time control we're interested in.

- the puzzles that we're interested in here are where the key move was NOT found in the game.

- should include defensive puzzles as well as attacking puzzles

- should exclude positions where one or both players were below some time threshold because I don't care about tactics missed in a silly time scramble.

- filter out the puzzles that are unrealistically difficult. Can be done by human curation, or possibly algorithmically (or maybe a combination of the two).

- manually examine the puzzles and group by theme.

Type 2 puzzles, difficult things found by the next rating range up:

- extract puzzles from the games of players rated (my rating+100) to (my rating+200), at whatever time control we're interested in.

- the puzzles that we're interested in here are where the key move WAS found in the game.

- again, it should include defensive puzzles as well as attacking puzzles

- this time, we only want the most difficult puzzles! These are the aspirational puzzles. The things you need to be able to find to step up to the next level.

- again, manually examine the puzzles and group by theme.

Probably the curation and grouping of the puzzles needs to be done by someone at or above the target rating range.

One issue with online puzzles, at least those on chess.com, that often isn't mentioned is that the positions only have one solution. That seems to be an artificial constraint that is, at best, unnecessary. In real games, there are plenty of situations were we struggle but there is more than one strong move available. Ideally we could train on these types of positions too.

Hopefully this hasn't been too long or off topic. In general I've long thought there is a ton of untapped potential for improving by solving better puzzle sets. I'd really like to see more innovation in this space.

19 more comments...

Say Chess

Are Chess Improvers Causing a Lichess Tactic Rating Deflation?

What Can We Learn From the Changes in Puzzle Ratings?

Discussion about this post