forum

Statistic approach to Player Skill and Beatmap Difficulty

posted
Total Posts
62
Topic Starter
Full Tablet
PM me if you want someone to be included in the next updates of this.


Based on scores on ranked beatmaps, the results here simultaneously estimate the difficulty of beatmaps (difficulty of achieving certain scores) and player skill (ability to score high on beatmaps).

A beatmap is rated high when there are low scores by players who are rated high in skill, whereas players are rated high when they have high scores in beatmaps that are rated high. The method is completely statistical, and doesn't look into the content of beatmaps (except for amount of objects for the expected amount of variance).

Terms:
Tom Stars: Star Rating of a beatmap (based on the algorithm mainly designed by Tom94)

"X" Diff: The estimated difficulty of achieving at least X*1000 of score in a map with 1000 retries. They are measured in a scale that resembles star rating, with the 900K Difficulty giving the closest values to Tom Stars.

Score Count: Amount of scores retrieved for the beatmap or player in the online leaderboards.

Average Play Skill: The average difficulty of all the scores in the records set by the player. Not a very meaningful measure, since it might consider scores made when the player was at a lower skill level.

Peak Play Skill: Similar to Average Skill, but the best scores are weighted much higher compared to the rest. It's very sensitive to outliers, so it is not a very robust indicator of skill.

Accuracy Performance: Indicator of skill that works similarly to pp (setting a sub-par score doesn't lower the value. The best score has a weight of 100%, while the 2nd one a weight of 95%, etc.). The scale is the same as the ones for XK Difficulty, so having a lot of scores of a certain difficulty makes the Accuracy Performance converge to that difficulty value.

Technical Performance: Similar to Accuracy Performance, but it doesn't award you for getting score past 900,000 (For example, setting a score of 960,000 awards the same performance as setting a score of 900,000). This estimates the ability of the player to set good scores in difficult maps, rather than awarding players for having very good accuracy in easier maps.

xK ppv2: The pp achieved from all ranked 4K maps the player has played, not considering the bonus pp from setting a lot of scores.

Newest Version (2020/02/02), 7K Only
https://drive.google.com/file/d/1vmWpPannfXiR3xTYoypbplV8xsciNPtB/view?usp=sharing

Old Version (2019/02/04), 4K Only
https://docs.google.com/spreadsheets/d/1njYWZSQjV6D8EHrCnpnzRbQycH0BG7C-DWy2--T8Zjw/edit?usp=sharing

Old Version (2018/02/16), 7K/9K
https://docs.google.com/spreadsheets/d/16ik3TElUYhzTkm6U6QdA_J0owiQJJ_Wx1yjmYNCJ9jk/edit?usp=sharing

Different keymodes have different scaling, they aren't meant to be directly compared one with another.

What are your opinions of the results?

This is not meant to replace the current beatmap difficulty algorithm used for pp, since it has limitations of purely statistical approaches. It might be used to calibrate beatmap difficulty algorithms based on beatmap analysis, though.

Edit: 2020/02/02: Updated results for 7K.
Shoegazer
I've always wanted to see a walkure algorithm or at least any form of algorithm based on leaderboards and a player's ability. The more I look at the spreadsheet however, the more I realise that the scores everyone has gotten are inconsistent to an extent (or some of the top players don't play at all, making the leaderboards skewed) and leaderboards don't quite show how difficult a map is in actuality because people of different skill levels will play different sets of maps and not all of them.

My biggest qualm is probably the fact that there's such a huge set of maps in the 2.4-3 range. I'm assuming that it's because a good number of the scores in that list are SSs and player skill wouldn't be captured very well because only those SS scores would be captured in the first place. I'm not sure how much more accurate would a top 100-150 be, though.

The maps ranging from AiAe to Bangin' Burst have odd numbers to me, is there any reason for those maps to be rated that much higher than say, Kamui? A good number of players are overrated in terms of skill level as well, but that's probably because of the 5 maps above.

Anyway, it's a nice idea, but it's probably not that meaningful of an approach mainly due to the fact that the top players only play some maps and avoid a good number, which makes a good number of maps underrated/overrated. Candy Galy, Sakura Mirage, Mastication Numerique and Brynhildr in the Darkness are definitely some examples - but for different statistical reasons (many players (mainly noticeably skilled ones) play CG/SM, not many played Mastication and Brynhildr).
_Kemo
um this is interesting but somehow there are inconsistent results, since some unpopular easy maps also have really bad scores on the leader board just as real hard maps do.

lol that's why I m soooo overrated in 5k and 8k lol
abraker
Impressive! You actually beat me to it XD. But as Kemo and Shoegazer said, your way of approach may not be 100% reliable.

There are 3 ways I can think of to calculate beatmap difficulty:
  • - Do what you did and base upon the score achieved. This will work only if many people play the beatmap to the best of their ability. So popularity may hinder this option useless.

    - The current system. Base it upon the highest note density. We all know how wrong this is.

    - Calculate the difficulty by beatmap composition. While this is the hardest of the 3 to do, it is also the most accurate. Composition would include patterns, the density of the patterns, extremes in BPM and SV, and keymode.
I am planning to inspect the beatmap patterns and come up with a difficulty index sometime in the (maybe far) future. But yeah, interesting.

stuff
4k: 2.85400073592293
7k: 3.55930806616186 <--- Unless the keymode is part of the calculation, I will not believe this. I struggle to get an A in 4* 7k, while I can do up to 5* 4k
8k: 2.66764821363234
Topic Starter
Full Tablet
For 6K, I used the same algorithm, but instead of only using top 50 scores, I used all the scores of 6K players that have at least 1 top 50 score (a total of 3443 scores, instead of 1200).

https://www.dropbox.com/s/1byfyyvo64b6d ... .xlsx?dl=0

Do you think the results are more accurate?

Doing the same with other keymodes would take me a while.
abraker
How come Sasaki Sayaka's [6K Normal] stat diff the same as Sasaki Sayaka's [6K Beginner] sat diff?
Topic Starter
Full Tablet

abraker wrote:

How come Sasaki Sayaka's [6K Normal] stat diff the same as Sasaki Sayaka's [6K Beginner] sat diff?
Beginner is rated 1.19984264020897, while Normal is rated 1.20305233104608 (very slight difference).
ovnz
I didn't know that star ratings were this precise holy shit
Bobbias
It's standard practice to use extremely precise numbers for all calculations and only round to whatever significant digits you want at the very end, to ensure no rounding errors enter the calculation.
[Crz]Player
9k when
Topic Starter
Full Tablet

ATTan wrote:

9k when
Added results for 9K in the first table (though they aren't very meaningful, since there is only 1 ranked mapset, and it is very easy).
Topic Starter
Full Tablet
Here are results for 4K using more scores (79840 instead of ~30000 taken from top 50 scores).
https://www.dropbox.com/s/1byfyyvo64b6d ... .xlsx?dl=0
This list used all the scores of randomly selected players (biased towards people with high amount of pp), instead of top 50 scores.

I plan to change the algorithm to consider plays with DT/HT/EZ mods as different maps instead of taking the score with the score penalty/bonus applied (the main problem currently is that I don't know exactly how much bonus DT gives, and it might not be possible to determine without per-object data).

I need to find a way to retrieve scores more quickly (the current way of taking scores outside of top 50 scores or performance with the osu! API is very inefficient and slow; it took me several days to obtain a list of 4K scores).
abraker

Full Tablet wrote:

I need to find a way to retrieve scores more quickly (the current way of taking scores outside of top 50 scores or performance with the osu! API is very inefficient and slow; it took me several days to obtain a list of 4K scores).
If you find a way, PM me. Osu!API++ is in need of that too.
Topic Starter
Full Tablet
I made some changes to the algorithm, inspired by this post: p/4383854

The algorithm fits the data (scores obtained by players) to logistic curves, where the parameters to fit are Player Skill, Beatmap Difficulty for 900K score, and Steepness of the difficulty curve for beatmaps.

The predicted score for a play is: .
Where P is the player skill, B is the beatmap difficulty (for 900K score), and S is the steepness parameter of the difficulty curve.

For example, 2 different maps that have the same difficulty at 900K, but different steepness:

The orange curve represents the difficulty curve of a map with high steepness, while the blue one has lower steepness.

The regression minimizes the sum of the square of the errors of the predicted scores compared to the data.

Here are results for ranked 6K maps: https://www.dropbox.com/s/vyoi1r86m9r8t ... .xlsx?dl=0

Take beatmap difficulty results with few scores with a grain of salt (specially ones with only 1 score to base the calculation from, those ones use a default steepness parameter instead of one calculated).

For the player rankings, there is also a "Performance" value. This value is calcutated based on the associated difficulty each play the player has, with a score penalty based on map length (since it's more likely to have fluke plays on shorter maps), and reduced weighting for beatmaps that had their difficulty estimated based on few scores (since they are more likely to not be accurate). The "Player Skill" is the value used in the beatmap difficulty estimation, and is more indicative of the average performance of the player in the plays he has had.

For running the algorithms for other keycounts, I would need to select players to base the calculations on (I can't use a very large amount, since the algorithm is expensive in RAM and CPU use). Ideally, the players should have a big amount of plays, and have a consistent performance (not having many scores with a performance below their current level of play, for example, a player that has improved a lot over time, but hasn't improved their old scores), also, the players should represent a wide range of skill levels. Once the beatmap difficulty values are calculated, adding more players to the ranking is relatively simple (but the score retrieval using the osu! API is still quite slow).
Clappy

Full Tablet wrote:

(I can't use a very large amount, since the algorithm is expensive in RAM and CPU use)
Get some faggot with a i7 5960X and 128 gigs of ddr4 to test it out for you
-Maus-
Your nick is my reaction
abraker

FullTablet wrote:

I can't use a very large amount, since the algorithm is expensive in RAM and CPU use
How much CPU time are we talking about here? Surely leaving the computer overnight would do the trick. As for RAM usage, I'm pretty sure there can be a way to avoid too much RAM usage by doing it in C++ non recursively.
Topic Starter
Full Tablet
Here are the current results for 7K maps and some (~400) 7K players:

https://www.dropbox.com/s/scz69rqs75g19 ... .xlsx?dl=0

Results for maps that have less than 20 plays in the data used are filtered out by default, since they are very likely to be innacurate. Since there aren't scores in the data used for the calculations from players that struggle in the easiest maps, results for very easy maps are also likely to be innacurate to low level players.

The "Ranking" column in the beatmap list is based on the difficulty of achieving 900k score in the map.

The scores analized are several weeks old (it doesn't take into accounts scores made by players recently), it takes several days to refresh the scores of the players to the current values.

The algorithm used for calculating the values is subject to change.

What do you think of the current results?
Tristan97
Sweet! I made the list of top players at rank 283!
Topic Starter
Full Tablet
Ran the same algorithm for 4K maps and 4K players (listing the players that appear in the top 100 map leaderboards the most, plus a few manual additions)

Here are results:
https://www.dropbox.com/s/scz69rqs75g19 ... .xlsx?dl=0

(Results for 7K in the document are based on the previous calculations, that uses only scores that are several months old)
show more
Please sign in to reply.

New reply