forum

Survey on osu!mania SR Research Results (& Supporter Event)

posted
Total Posts
6
Topic Starter
Bubbler
Hello. I'm Bubbler, a Korean osu!mania player currently ranked at around #30.

As a university graduation project, I have been doing a research to fix the star rating system in osu!mania.
I'm happy to share the results of this project.

The idea is to completely ignore each beatmap's internal structure (notes, OD, HP and such), and assign SR values to each beatmap based on the players' scores achieved on that map.
A little bit more precisely, if player A got a score of X on one map and Y on another, the former map has a higher chance to be assigned higher SR than the latter if X is lower than Y.
Please note that the difficulty of passing is ignored in the calculation due to the limitations of osu! API.
Also, the research is far from complete; there are still lots of work to do before you (possibly) see this system in-game.

The dataset consists of scores achieved by top-1000 players on 208 maps with >4 in-game SR until April 1, 2017.
Doubletime scores and scores below 500k were removed, so only scores with nomod and its variations (HD/FL, SD/PF) are included.

The link below is a spreadsheet that shows the new SR values for the selected beatmaps.
You can also freely sort the table by each column using the filter button.
https://docs.google.com/spreadsheets/d/ ... E2sq_jlhxA

Osu! Difficulty = Current SR values based on osu!web
Estimated Difficulty = New SR values computed using my proposed method
Difficulty Change = Estimated Difficulty - Osu! Difficulty

Here is the survey for the research, and anyone with valid osu!mania ranking can submit a response.
https://goo.gl/forms/7I3XiKgvxHfDktOX2
Also, two random players with quality responses will earn 1-month supporter each!

The survey will be open until June 7 midnight KST (June 7 15:00 GMT).
Neuro-
cool project, submitted a response.
Aqo
Do note that difficulty-rating based on player results and not based on actual note placement has the following flaws which become bigger as the amount of charts and players grows:

1. More popular songs get more plays and people play them more. As a result, they get inflated, and a player-play-based system will assume those songs are easier than what they really are, simply because people "tried" harder on them.

2. Newer songs have much less accurate data than older songs.

3. Depending on the metagame, most players may choose to focus on one single type of difficulty, and as a result you get very flawed information about all other types of difficulties. For example, >90% of the players might only want to play one difficulty out of all 5-6 difficulties most mapsets have. Maybe they only want to get play stuff around 5*-6*. As a result, anything which is <5* or >6* will get very flawed information.

While player-based statistics are interesting, they're only worth anything if everybody plays every single difficulty of every single map and tries their hardest on every single play. This never actually happens.

It would be more interesting if you took the effort to make a more accurate algorithm based on the actual note placement in chart data. However here you will run into problems too:

Different players have different skills. The easiest example is BMS vs O2Jam. Lets say we have one type of chart with very fast bracket long stream pattern on high OD, and another chart with lots of long note patterns but medium speed. If 90% of the community is one type of players, they can say one chart is harder than the other - but which one they say depends on just the amount of players of each type. there is no "right" answer.
If you still wanted to give difficulty ratings for each chart, you'll need to rate different styles of patterns differently. Chords, delay, jacks, LN... etc.

Different parts within the same chart can have different difficulty too. You can have a chart where 100% of the chart has difficulty "5", and you can have a chart where 90% of the chart is difficulty "1" but 10% of the chart is difficulty "7".
Now if you take players who play on skill "2", they can all get 90% on this chart, and they will say it's "easy" and "easier" than a chart where it's 100% difficulty "3". But if you take players on skill 5.5, they will say the "5" chart is muuuch harder than that "3" chart, and that the 90%-10% 1-7 chart is waaay harder than the "5" one. So really, how do you give just one difficulty number to a whole chart, when most charts have inconsistent difficulty spread? You'll need to rate each section of the chart separately.

The bottom line here, whatever kind of difficulty ratings you go for, remember that it's just statistics for fun. It will be inaccurate for every single player, and even the same player can change his opinion about how accurate your ratings are over the time he plays and improves in the game. If you're doing research, you have to mention this aspect, and explain what kind of player your ratings are meant for, because by themselves they're kinda arbitrary.
Full Tablet
This project is very similar to this one t/329678 (it currently has 4Key scores scores from February-March, and it has been calculating for several months, so the latest results haven't been published. The results have been delayed because I decided to change some minor aspects of the algorithm mid-calculation).

As for the issues mentioned about a statistical approach to calculation.

Aqo wrote:

1. More popular songs get more plays and people play them more. As a result, they get inflated, and a player-play-based system will assume those songs are easier than what they really are, simply because people "tried" harder on them.
This problem can be alleviated by changing how each score is weighted. In practice, most of the plays done by a player tend to represent by themselves a skill level that is lower than what the player can do; so, by weighting plays that exceed the expectation more heavily than plays that are below the expectation in the statistical analysis, overall beatmap do not get over-weighted because the players didn't try hard enough to achieve the score (as long as the sample size is big enough. A problem of this approach is that the results converge more slowly when adding more samples, compared to weighting all scores equally, thus maps need more plays to get meaningful results; as a consequence, beatmap with few plays have to be weighted very lightly when estimating player skill).


Aqo wrote:

2. Newer songs have much less accurate data than older songs.
Newer songs tend to have less plays than older songs, so their results aren't as accurate. Since their results can't be considered accurate, they are weighted very lightly when estimating player skill.

Aqo wrote:

3. Depending on the metagame, most players may choose to focus on one single type of difficulty, and as a result you get very flawed information about all other types of difficulties. For example, >90% of the players might only want to play one difficulty out of all 5-6 difficulties most mapsets have. Maybe they only want to get play stuff around 5*-6*. As a result, anything which is <5* or >6* will get very flawed information.
Some players do only play specific kind of maps. Those players can only give information about the relative difficulty of those kinds of maps. All can a statistical analysis do is hope there are enough players that play different kind of maps so maps can be compared reliably.

Aqo wrote:

Different players have different skills. The easiest example is BMS vs O2Jam. Lets say we have one type of chart with very fast bracket long stream pattern on high OD, and another chart with lots of long note patterns but medium speed. If 90% of the community is one type of players, they can say one chart is harder than the other - but which one they say depends on just the amount of players of each type. there is no "right" answer.
If you still wanted to give difficulty ratings for each chart, you'll need to rate different styles of patterns differently. Chords, delay, jacks, LN... etc.
This is one of the main flaws of only looking at the scores on different maps, there is no way of telling which pattern each map has. In the current system, a player that specializes in o2jam-style maps, will tend to have plays in those maps rated as their best scores. As said in the previous point, the way BMS-style maps are rated compared to o2jam maps depends mostly on the players that play both kinds of maps; and the results can be easily influenced by how popular the style of mapping is.

Aqo wrote:

Different parts within the same chart can have different difficulty too. You can have a chart where 100% of the chart has difficulty "5", and you can have a chart where 90% of the chart is difficulty "1" but 10% of the chart is difficulty "7".
Now if you take players who play on skill "2", they can all get 90% on this chart, and they will say it's "easy" and "easier" than a chart where it's 100% difficulty "3". But if you take players on skill 5.5, they will say the "5" chart is muuuch harder than that "3" chart, and that the 90%-10% 1-7 chart is waaay harder than the "5" one. So really, how do you give just one difficulty number to a whole chart, when most charts have inconsistent difficulty spread? You'll need to rate each section of the chart separately.
This is addressed by calculating a difficulty curve for each map (how hard is to get certain amount of score), instead of just giving a single number for the beatmap difficulty.
Tripletth
Topic Starter
Bubbler
@Aqo & @Full Tablet
I agree to most part of your opinions, and I'm planning to improve the algorithm by adding weights to scores based on the date of being ranked (per map) and the date achieved (per player/map). (Is there some way to measure popularity of a map based on osu API?) I'm also planning to use the map structures later so that I can generate some code that calculates SR from each map and finally pass it to osu!dev, but it seems to require more advanced machine learning techniques (based on my current approach) so it will take more time.

Difficulty curve is also a good idea, but for now I'm aiming for a method that is the least destructive for the current osu! ranking system, so I'm not considering it right now.

For the supporter event, the winners (1 month each) are Elementaires and Yyorshire.
Please note that anyone can still submit responses at any time.
Please sign in to reply.

New reply