21 Comments

A lot of people use the puzzle dashboard. Bowens mate is on there. If all your doing is Boden mates after the first 15 or 20 you will knock out the next 80 correctly even if your only 1200 rated. The pattern is easy to solve and see even after a short amount of practice.

Any of the mates on there I am sure got really depressed ratings just due the fact you can learn that pattern and do hundreds for practice and your puzzle rating has nothing to do with it.

The puzzle rating of anyone on lichess means nothing due to that dashboard. If I am only doing endgame puzzles I might run a rating of 1500, mate puzzles I can easily get up to 2400. Normal random stuff and I'm somewhere between 2000 and 2200.

The fact there are so many areas and types of puzzles as well as the 5 strength settings makes the puzzle rating of anyone on there mean nothing next to anyone elses.

Expand full comment

That is an excellent observation, and as you say it probably has a quite significant effect. I wonder when lichess made the puzzle dashboard.

Expand full comment

I wonder if using lichess puzzles is the best idea for your book? Certainly I don't have trust that lichess puzzle ratings are particularly useful.

If someone were writing a puzzle book for me, here's what I'd like them to include & do:

Type 1 puzzles, things missed by my rating range:

- extract puzzles from the games of players of my rating, say within +/- 100 points, at whatever time control we're interested in.

- the puzzles that we're interested in here are where the key move was NOT found in the game.

- should include defensive puzzles as well as attacking puzzles

- should exclude positions where one or both players were below some time threshold because I don't care about tactics missed in a silly time scramble.

- filter out the puzzles that are unrealistically difficult. Can be done by human curation, or possibly algorithmically (or maybe a combination of the two).

- manually examine the puzzles and group by theme.

Type 2 puzzles, difficult things found by the next rating range up:

- extract puzzles from the games of players rated (my rating+100) to (my rating+200), at whatever time control we're interested in.

- the puzzles that we're interested in here are where the key move WAS found in the game.

- again, it should include defensive puzzles as well as attacking puzzles

- this time, we only want the most difficult puzzles! These are the aspirational puzzles. The things you need to be able to find to step up to the next level.

- again, manually examine the puzzles and group by theme.

Probably the curation and grouping of the puzzles needs to be done by someone at or above the target rating range.

One issue with online puzzles, at least those on chess.com, that often isn't mentioned is that the positions only have one solution. That seems to be an artificial constraint that is, at best, unnecessary. In real games, there are plenty of situations were we struggle but there is more than one strong move available. Ideally we could train on these types of positions too.

Hopefully this hasn't been too long or off topic. In general I've long thought there is a ton of untapped potential for improving by solving better puzzle sets. I'd really like to see more innovation in this space.

Expand full comment

I think as long as I use the same database download for the whole series it should be experienced as a continually more difficult ladder to go through the books.

But I still like your idea, and I'm actually continuing to work with the missed move concept. I will probably write an update about in a couple of weeks (if it goes well)

Expand full comment

Its also possible the ratings simply hadn't had enough time to settle when you pulled your dataset in 2021. Lichess only launched the new puzzle generator in December of 2020. Its too bad we don't have the ratings from 1 year ago, but maybe repeat this in one year and see if they are continuing to shift.

https://lichess.org/blog/X-S6gRUAAGjNX4ki/new-puzzles-are-here

Expand full comment

That's a good thought, and a possibility for sure! I will save the results so I can do a new check

Expand full comment

Please note that there may be no "reversion to the mean" going on at all, it may simply be caused by a well known statistical fallacy.

suppose puzzles have a true rating (rtrue_i for puzzle i), but the observed rating deviates from the true rating by some random measurement error that is mean zero and independently distributed over time: robs_i,t=rtrue_t+epsilon_(i,t). So the observed rating for puzzle i at time t is the true rating plus a random noise term.

For a randomly selected puzzle the expectation of the measurement error is zero.

But this is no longer true if we select puzzles based on their observed rating:

For a puzzle rated below the average of all puzzles (robs_i,t<mean(rtrue)) the expected value of epsilon is negative (a puzzle can have a low observed rating if its true rating is low, or we just got a negative measurement error by accident), for large observed ratings the opposite is true

(note: you may say isn't the expected value of epsilon supposed to be zero? this is correct only for unconditional observations, but when you select very large or very small observed ratings it is no longer unconditional, you're selecting the puzzle based on the outcome, now you're taking a conditional expectation of epsilon, and it is no longer zero)

The next time you observe the puzzle you get a new measurement error:

robs_i,t2=rtrue_i+epsilon_i,t2

because the measurement errors are independent over time (at least in the long term), and because now you're taking an unconditional observation (you're no longer selecting the observation based on its outcome), the expectation of epsilon_(i,t2) is zero

it follows that E(robs_i,t1)<E(robs_i,t2) for robs_it1 < mean of robs, and

E(robs_i,t1)>E(robs_i,t2) for robs_it2 > mean of robs

this is true even though the true ratings of the puzzles have not changed at all

this is known as Galtons fallacy

to get proof that this is what is going on I suggest you turn your analysis around: instead of measuring where the ratings of puzzles are going to, instead look where they are coming from

so select puzzles in August 2023, and then look where they were a year earlier.

you will find that the low rated ones came down from higher ratings, and the highly rated increased from lower down

after having said that, given the distribution of your results it may be possible there is a general updrift in ratings as well, but without more detailed analysis it is difficult to tell

Expand full comment

Thank you for your insightful comment. In regards to bias I would say that I did not pick the puzzles based on the highest or lowest rated, but after the most solved puzzles. Do you think the results would change if I made the same in reverse order?

Overall I agree that more reference points of the movement of the ratings would be helpful.

Expand full comment

I could also have done a complete random selection of puzzles based on the Puzzle ID's

Expand full comment

Have checked (solved) the puzzle you mentioned (https://lichess.org/training/S1k6e) already. It has Puzzle Rating 1265 now. Played 233,860 times. The rating of the puzzle jumped 366 points up in just several hours, today. Strange. 😜

Expand full comment

That’s interesting! I wonder if a bug has been discovered 🤔

Expand full comment

Do you think that you posting this puzzle brought that specific puzzle to the attention of a whole new group of people, many of whom won't have seen this puzzle or anything similar before. They then have a go at it, get it wrong and the rating quickly gets higher?

Expand full comment

I think it also might have to do with the group of puzzle solvers, and how lichess will repeat puzzles to a person over time. I don't do puzzles on lichess often, but when I do sometimes they space-repetition me. If I'm made to solve a puzzle a few times I'll remember it and my future solve will push the rating of that puzzle down slightly. I'm not sure that I'd be a better solver for puzzles in general but for that particular puzzle I'd get it. This would have a tendency for puzzles that are shown to lots of players to then deflate because the puzzles would be solved more often.

If the puzzle solving group stayed the same, I might expect that the puzzle solvers are a little over-rated compared to the puzzles because the puzzles they see would be repeated slightly and skew the puzzle-rating down and the solver up.

Expand full comment

Maybe I'm just not good at remembering puzzles, but I can't remember having seen a puzzle twice

Expand full comment

Themes give repeats the most.

Plus now the top of the dashboard is endgames as well. Pick piece types of endgames. Your puzzle rating based on those have almost nothing to do with tactics on a lot of them. It's just straight up winning an endgame on brute calculation.

Expand full comment

Might be interesting to look at themed puzzles and see if they are more deflated 🧐

Expand full comment

Think Jan 2021, The openings section much more recent.

And Mike made some good points on how Lichess gives you the same puzzles over and over. I have listed on my account that I have done over 20,000. That is not even close. I did over 100 backrank mates one afternoon and it only counted around 20 toward how many I have done on there. It does not count puzzles you have done before and seems to give a lot of the same puzzles over and over for some reason.

Expand full comment

Are you solving themes when you get the same or just the mix?

Expand full comment

If your looking for some more data on the puzzles on lichess, I made a bunch of lichess blog posts about different tactics patterns inside backrank mate puzzles last year. I put next to each puzzle link the rating of the puzzle from that day of the post.

So you can look at that rating and do the puzzle to see where it is today. They go from 1700 to 2500 on those posts.

Lichess name - johnnyzangerous

Expand full comment

The first plot with the rating bands was not clear to me. Only after I read on I realized that what you did was to group the ratings into "bands" of 100 rating points (1000-1100 (or 1000-1099?), 1100-1200) etc. Maybe you could clarify that part and rebuild the chart X-axis labels to reflect that? Or did I misunderstood it?

Expand full comment

The dataset is with 100 points spread which probably should go again on the diagram if it should be done optimal. (Part of this exercise was also a way for me to learn how to use mathplotlib)

Expand full comment