Statistic approach to Player Skill and Beatmap Difficulty

Full Tablet

Joined September 2011

Topic Starter

Full Tablet 2015-05-17T17:51:52+00:00

PM me if you want someone to be included in the next updates of this.

Based on scores on ranked beatmaps, the results here simultaneously estimate the difficulty of beatmaps (difficulty of achieving certain scores) and player skill (ability to score high on beatmaps).

A beatmap is rated high when there are low scores by players who are rated high in skill, whereas players are rated high when they have high scores in beatmaps that are rated high. The method is completely statistical, and doesn't look into the content of beatmaps (except for amount of objects for the expected amount of variance).

Terms:
Tom Stars: Star Rating of a beatmap (based on the algorithm mainly designed by Tom94)

"X" Diff: The estimated difficulty of achieving at least X*1000 of score in a map with 1000 retries. They are measured in a scale that resembles star rating, with the 900K Difficulty giving the closest values to Tom Stars.

Score Count: Amount of scores retrieved for the beatmap or player in the online leaderboards.

Average Play Skill: The average difficulty of all the scores in the records set by the player. Not a very meaningful measure, since it might consider scores made when the player was at a lower skill level.

Peak Play Skill: Similar to Average Skill, but the best scores are weighted much higher compared to the rest. It's very sensitive to outliers, so it is not a very robust indicator of skill.

Accuracy Performance: Indicator of skill that works similarly to pp (setting a sub-par score doesn't lower the value. The best score has a weight of 100%, while the 2nd one a weight of 95%, etc.). The scale is the same as the ones for XK Difficulty, so having a lot of scores of a certain difficulty makes the Accuracy Performance converge to that difficulty value.

Technical Performance: Similar to Accuracy Performance, but it doesn't award you for getting score past 900,000 (For example, setting a score of 960,000 awards the same performance as setting a score of 900,000). This estimates the ability of the player to set good scores in difficult maps, rather than awarding players for having very good accuracy in easier maps.

xK ppv2: The pp achieved from all ranked 4K maps the player has played, not considering the bonus pp from setting a lot of scores.

Newest Version (2020/02/02), 7K Only
https://drive.google.com/file/d/1vmWpPannfXiR3xTYoypbplV8xsciNPtB/view?usp=sharing

Old Version (2019/02/04), 4K Only
https://docs.google.com/spreadsheets/d/1njYWZSQjV6D8EHrCnpnzRbQycH0BG7C-DWy2--T8Zjw/edit?usp=sharing

Old Version (2018/02/16), 7K/9K
https://docs.google.com/spreadsheets/d/16ik3TElUYhzTkm6U6QdA_J0owiQJJ_Wx1yjmYNCJ9jk/edit?usp=sharing

Different keymodes have different scaling, they aren't meant to be directly compared one with another.

What are your opinions of the results?

This is not meant to replace the current beatmap difficulty algorithm used for pp, since it has limitations of purely statistical approaches. It might be used to calibrate beatmap difficulty algorithms based on beatmap analysis, though.

Edit: 2020/02/02: Updated results for 7K.

Last edited by Full Tablet 2020-02-02T08:00:54+00:00, edited 16 times in total.

Shoegazer

osu!mania Paragon

528 posts

Joined April 2013

Shoegazer 2015-05-18T01:51:00+00:00

I've always wanted to see a walkure algorithm or at least any form of algorithm based on leaderboards and a player's ability. The more I look at the spreadsheet however, the more I realise that the scores everyone has gotten are inconsistent to an extent (or some of the top players don't play at all, making the leaderboards skewed) and leaderboards don't quite show how difficult a map is in actuality because people of different skill levels will play different sets of maps and not all of them.

My biggest qualm is probably the fact that there's such a huge set of maps in the 2.4-3 range. I'm assuming that it's because a good number of the scores in that list are SSs and player skill wouldn't be captured very well because only those SS scores would be captured in the first place. I'm not sure how much more accurate would a top 100-150 be, though.

The maps ranging from AiAe to Bangin' Burst have odd numbers to me, is there any reason for those maps to be rated that much higher than say, Kamui? A good number of players are overrated in terms of skill level as well, but that's probably because of the 5 maps above.

Anyway, it's a nice idea, but it's probably not that meaningful of an approach mainly due to the fact that the top players only play some maps and avoid a good number, which makes a good number of maps underrated/overrated. Candy Galy, Sakura Mirage, Mastication Numerique and Brynhildr in the Darkness are definitely some examples - but for different statistical reasons (many players (mainly noticeably skilled ones) play CG/SM, not many played Mastication and Brynhildr).

Last edited by Shoegazer 2015-05-18T04:23:57+00:00, edited 1 time in total.

_Kemo

22 posts

Joined May 2013

_Kemo 2015-05-18T02:56:07+00:00

um this is interesting but somehow there are inconsistent results, since some unpopular easy maps also have really bad scores on the leader board just as real hard maps do.

lol that's why I m soooo overrated in 5k and 8k lol

abraker

Global Moderator

8,327 posts

Joined July 2014

abraker 2015-05-19T17:09:46+00:00

Impressive! You actually beat me to it XD. But as Kemo and Shoegazer said, your way of approach may not be 100% reliable.

There are 3 ways I can think of to calculate beatmap difficulty:

- Do what you did and base upon the score achieved. This will work only if many people play the beatmap to the best of their ability. So popularity may hinder this option useless.

- The current system. Base it upon the highest note density. We all know how wrong this is.

- Calculate the difficulty by beatmap composition. While this is the hardest of the 3 to do, it is also the most accurate. Composition would include patterns, the density of the patterns, extremes in BPM and SV, and keymode.

I am planning to inspect the beatmap patterns and come up with a difficulty index sometime in the (maybe far) future. But yeah, interesting.

stuff

4k: 2.85400073592293
7k: 3.55930806616186 <--- Unless the keymode is part of the calculation, I will not believe this. I struggle to get an A in 4* 7k, while I can do up to 5* 4k
8k: 2.66764821363234

std skin 2021: link | mania skin 2021: (vanilla ver ~ hidden ver)
osu!Skills - Compare your skills in a slightly different way
OT!neus - osu off-topic subforum's very own discord server

Full Tablet

Sign In To Proceed

Don't have an account?

Statistic approach to Player Skill and Beatmap Difficulty

abraker wrote:

ATTan wrote:

Full Tablet wrote:

Full Tablet wrote:

FullTablet wrote:

snoverpk wrote:

Aqo wrote:

Khelly wrote:

coldloops wrote:

coldloops wrote:

snoverpk wrote:

New reply