The reason why "never retry" is (probably) bad advice

Yolshka

Joined April 2014

Yolshka 2017-12-15T11:06:33+00:00

Cxu had another answer aswell which I can't find but he was basically saying that it doesn't really matter whether you retry or not, because you end up playing the same amount anyway, which i believe to be true. If you add up a couple tv size songs it's a marathon.

Or something like that maybe im butchering it. Can't fully remember but maybe you'll find it. I agree with that whatever that was.

But it kinda sounds obvious that by retrying you'll get pp faster, since that's just how it works, i mean if you shitmiss and don't retry and then play an ar 8 sliderfuck map and only fc the map you were trying 4 days later it would seem like you're improving slower, even though in theory you probably would have been able to fc it given sufficent retries.

Also you might want to take into account that both rohulk and jesse are mentally retarded, based on my professional analysis.

So i guess do whatever you want is a solid statement.

But i don't think anyone thinks it's really bad to retry other than the edgelord princesses, but then you have people who say dont play nf, dont touch mods, don't play short maps, don't play anything more recent than 2013, not to mention etc and so forth.
Which is probably also a bad advice.

Akanagi

492 posts

Joined February 2017

Akanagi 2017-12-15T11:14:47+00:00

You can't generalize stuff like that.

Obviously people who retry a lot on pp maps are going to "improve" (their pp) a lot faster than people who almost never do so and F2 all the time.

Celine

498 posts

Joined November 2013

Celine 2017-12-15T11:14:51+00:00

oh you're the guy who made that pointless elitism rant thread, no wonder

Railey2

512 posts

Joined February 2015

Topic Starter

Railey2 2017-12-15T11:28:44+00:00

Rayne wrote:
You can't generalize stuff like that.

Obviously people who retry a lot on pp maps are going to "improve" (their pp) a lot faster than people who almost never do so and F2 all the time.

Farming boosts you short-term, that's obvious. The question is if it hurts you long-term or not. We can check that by looking at players that are already playing a long time (higher ranked players, generally)

The top 3k are already playing for a long time, so even if they had an advantage through farming early on, they surely would have lost that by now and instead started hurting themselves through their bad habits?

Not really. As I said before, it doesn't show up. If it exists, its probably a really weak effect.

Yurumo

74 posts

Joined April 2017

Yurumo 2017-12-15T11:40:35+00:00

Good god the detail is strong with this one.
Make a tl;dr pls.

ManuelOsuPlayer

1,565 posts

Joined July 2014

ManuelOsuPlayer 2017-12-15T12:43:59+00:00

Hitcount dosn't mean nothing.
Player who plays longer maps have more hitcount. And that dosn't mean they retry less.

In my personal case i can't retry without getting worst.

Worst osu player effort/reward.

Sightreading Pass:
9/08/2018 The Big Black
10/08/2018 Lia - The Never Ending Love (DJ Shimamura's XTC Remix)
10/08/2018 Team.NEKOKAN - Airman ga Taosenai - Blown by the wind
13/08/2018 Camellia - Insecticide - [Arles AR10]
13/08/2018 GYZE - HONESTY
20/08/2018 Seiryu - AO-Infinity

Railey2

512 posts

Joined February 2015

Topic Starter

Railey2 2017-12-15T13:03:20+00:00

Pawnables wrote:
Good god the detail is strong with this one.
Make a tl;dr pls.

gladly

It has been claimed by many that retrying a lot hurts your improvement.

After carefully looking at some data, I concluded that this claim is unfounded. The data I used are the top10k players, I plotted (pp/hitcount) against (hitcount/playcount) to approximate a rate of improvement and a retry-rate, then checked if improvement increases or decreases the more people retry.
Turns out it doesn't. More details in full version.

ManuelOsuPlayer wrote:
Hitcount dosn't mean nothing.
Player who plays longer maps have more hitcount. And that dosn't mean they retry less.

In my personal case i can't retry without getting worst.

(hitcount/playcount) describes hits per play. It doesn't perfectly align with retry-rate because it is influenced by the players map-choices, I'm aware of that.
That doesn't mean that it's a useless variable, though. As I said before, while not being a perfect measure for it, it still approximates retry-rate. As you can see, Rohulk has the highest value. Players like Rafis have low values.
It's not perfect, but it's also not useless, there is clearly a positive correlation between hits per play and retry-rate.

Another nice thing about statistics: If you have inaccuracies, (like your values are always off by a certain margin in both directions), then that will not matter for the final analysis as long as have enough data to work with. According to probability theory, the inaccuracies will simply cancel out.

It's only problematic if your inaccuracies point in only one direction systematically, which shouldn't really be the case here. Not with the (hitcount/playcount)-variable, at least. The other one is far more difficult.

ManuelOsuPlayer

1,565 posts

Joined July 2014

ManuelOsuPlayer 2017-12-15T13:56:08+00:00

Retry less isn't bad advice.
I think players who retry more, force themselfs more than usually players what dosn't retry. Because you usually don't retry when you don't care a lot, or the map it's to easy to play what you do it perfectly fine, or hard enought to don't think about a fc be possible for you. So you don't push yourself.
Meanwhile retrying 200 times, you're searching for a good score each time. So you try to improve more than a non retry player.

In my opinion it's 100% a mental thing. Some players retrying, play worst and worst because they lose energy and focus as more they try at that moment, making them play worst and get bad habits.
Meanwhile other players with other mentality get more closer each try. And one try more means more chances to reach the goal, new knowledge, more improvment. Getting closer each try.

Maybe non retry players mentality put all the motivation and energy at first attemps. And retry players get more energy and more focused as more they keep trying on the map until get it.

2 years ago i was a retry player, searching for ranks like crazy. I was improving my plays. I had like 30-50 maps and i only was playing those over and over until get the fc. After long long time trying one day suddently i started improving. Sadly my pc broke and couldn't play anymore.

After a months break i back from play as a 50k player to be a 200k player in skill. Somehow i understand i was not enjoying the game, so i stop retry and start improving until now.
A few days back i started retrying on a goal to improve and trying to get higher ranks each day. I quick start getting really consistenly getting FC at 1-7 retry at some 5* maps.
After a week doing it i was totally unable to play 4* maps what i could consistenly fc at first try consistenly. My acc drop, my combo drop, my stamina drop, and became totally unable to fc maps what i did fc easy days ago.

Also starting get a problem what i never had before. If i get a miss i'm actually unable to hit any circle after the miss and instanly die at the map.

Seems like a big mental wall due to retry over and over.

There is something around there where you can retry and get way better than when you don't do it. But at same time If you retry you can lose skills.

Maybe one days you can retry all what you want and others you should not retry. Or you can retry all what you want but only at X maps, and if you do it at other maps, then you lose skills.

No idea.

The only thing what I'm 100% sure about this is retriying you lost stamina. If you want stamina don't retry. Can't get nothing more about my experience as a player who almost never retry.

Worst osu player effort/reward.

Sightreading Pass:
9/08/2018 The Big Black
10/08/2018 Lia - The Never Ending Love (DJ Shimamura's XTC Remix)
10/08/2018 Team.NEKOKAN - Airman ga Taosenai - Blown by the wind
13/08/2018 Camellia - Insecticide - [Arles AR10]
13/08/2018 GYZE - HONESTY
20/08/2018 Seiryu - AO-Infinity

pandaBee

2,004 posts

Joined December 2014

pandaBee 2017-12-15T17:37:59+00:00

Hi OP, I fixed your graphic for you. You can repost it now.

P.S.

You do realize that the people we give the "never retry" advice are a different subset of players than the top 10k? Do you really think the top 10k come to G&R for advice? More like 100k and up.

Using data from the elite playerbase to make conjectures about the bottom of the barrel dregs of circle clicking society? Peppy pls.

KupcaH

2,829 posts

Joined January 2016

KupcaH 2017-12-15T18:41:36+00:00

O, another cool post.

*grabs popcorn*

pandaBee wrote:
You do realize that the people we give the "never retry" advice are a different subset of players than the top 10k? Do you really think the top 10k come to G&R for advice? More like 100k and up.

how do I get better?

Last edited by KupcaH 2017-12-15T18:55:08+00:00, edited 1 time in total.

Some links on how to get better at clicking circles ~~accurately~~ in this ~~rhythm~~ game called osu!

>How to progress smartly through the basic levels of gameplay
>Streaming guides/tips 1,2,3,4
>Bubbleman's osu!Gameplay playlist
>Tonehh osu! channel

N0thingSpecial

2,355 posts

Joined November 2015

N0thingSpecial 2017-12-15T18:50:22+00:00

Just sayin

What if there’s so little people who actually retry less, that you’re just plotting graph of people who retry just as much as each other? What if the meta is mostly comprise of short maps and there’s not a significant difference between people who’s farming from a person who is just F2ing maps which also landed on short maps?

Your data still have too much other variables affecting it like inactive players, non meta players, just the fact that no one’s improvement graph has a linear progression shows that your data could vary based on when you collected the data.

I have nothing intelligent to contribute to this discussion, retry less holds in my mind cause of the “play from start to finish” part

The sun is a deadly laser

Railey2

512 posts

Joined February 2015

Topic Starter

Railey2 2017-12-15T19:19:27+00:00

N0thingSpecial wrote:
Your data still have too much other variables affecting it like inactive players, non meta players, just the fact that no one’s improvement graph has a linear progression shows that your data could vary based on when you collected the data.

There's certainly a lot going on with the data, which is also why we have such a low correlation, but most things shouldn't mess with the trend line itself. For example.. there is no reason to believe that non-meta players, inactive players and so on have a higher or lower hit per play ratio, so these types would probably be distributed evenly over the x-axis and therefore cancel out. Rhey're going to be the same as the outliers that I removed in the "Filtered"-chart. They don't actually have an impact on the result. As I said, if the inaccuracies don't distort the trendline down or up in a systematic manner, there's no reason to be concerned about them.
We're not trying to have an accurate assessment of effect strength or anything, so we don't have to be that rigid about things.

N0thingSpecial wrote:
What if there’s so little people who actually retry less, that you’re just plotting graph of people who retry just as much as each other?

We're looking at quite a good range for "hits per play", from ~150 to ~300. I doubt that they're just all the same. People who retry less are going to sit closer to 300, people who retry more are going to sit closer to 150. Occasionally you'll get the guy who plays a lot of marathons but apart from that actually retries quite a lot and then still ends up with a higher number. But again, the correlation should still be there.

If you're farming, your hits per play are probably somewhere around 175. Unless someone hits F2 and lands on haitai all day, he's not gonna beat that figure.

pandaBee wrote:
Using data from the elite playerbase to make conjectures about the bottom of the barrel dregs of circle clicking society? Peppy pls.

sure, why not? "Play more" works for new players just like it works for pro players. Rohulk is preaching "never retry" to everyone indiscriminately, and religiously follows his own advice. "Challenge yourself and push your limits" has always been a cornerstone of improvement no matter where you go or what you do, it doesn't just apply to osu. Do you see any good reason why extrapolating the results wouldn't work?

pandaBee

2,004 posts

Joined December 2014

pandaBee 2017-12-15T19:45:53+00:00

Railey2 wrote:
sure, why not? "Play more" works for new players just like it works for pro players. Rohulk is preaching "never retry" to everyone indiscriminately, and religiously follows his own advice. "Challenge yourself and push your limits" has always been a cornerstone of improvement no matter where you go or what you do, it doesn't just apply to osu. Do you see any good reason why extrapolating the results wouldn't work?

There are plenty. A few off the top of my head:

players in the top 10k are for the most part well established with the various skillsets of Osu. Beginners and to an extent intermediates are not.

Beginners tend to fall into the trap of focusing on pp farming to inflate their egos and small PPs (am i good guiz? look how talented i am, look how much pp i have xdd) which leads to lopsided skillsets, bad habits, frustration, etc. High ranked players won't have a lot of these same issues since their skillsets are already well established (other than maybe the frustration).

High ranked players are well acquainted with many different styles of beatmaps and have accumulated a large volume of plays on a wide spread of maps. Beginners have not. Most beginners have a paltry amount of beatmaps and would benefit from expanding their pool.

That being said, playing more will usually always let you improve regardless of whether you're retrying or not. So of course you can improve while retrying, but there are good ways and bad ways, better ways and worse ways to go about doing things.

Endaris

5,282 posts

Joined June 2010

Endaris 2017-12-15T19:55:46+00:00

Of course I support the idea of retrying.
That is why it is part of the core of my gameplay guide which you can check out here.
And always remember: Retry smart, play hard!

pandaBee

2,004 posts

Joined December 2014

pandaBee 2017-12-15T19:57:21+00:00

Endaris wrote:
Of course I support the idea of retrying.
That is why it is part of the core of my gameplay guide which you can check out here.
And always remember: Retry smart, play hard!

Wow what a shameless plug :^)

Personally speaking I usually play a map 2-3 times before moving on when I'm practicing that is.

abraker

Global Moderator

8,295 posts

Joined July 2014

abraker 2017-12-16T02:53:46+00:00

my 2 cents: both have their benefits and draw backs.

Retrying a lot works in the short term, fails in the long term. When you retry the same map over, you will squeeze the pp you want from it, but ignore skill other maps may require. It's more a brute force method if anything. You will be able to do certain maps pretty well, and others not too well.

Variety fails in the short term, but works in the long term. You are going to do horribly on every map at first, but slowly improve on all of them as time goes on. You won't see sudden big pp gains from this, and so it may look less rewarding. However, you will have the skill set needed to handle most things thrown at you, but to a not precise degree.

std skin 2021: link | mania skin 2021: (vanilla ver ~ hidden ver)
osu!Skills - Compare your skills in a slightly different way
OT!neus - osu off-topic subforum's very own discord server

chainpullz

2,334 posts

Joined June 2013

chainpullz 2017-12-16T03:31:39+00:00

You did a decent job of discussing some of the limitations of your metrics only to handwave them all away saying "I don't believe the numbers lie."

The core issue being that pp is a very bad indicator of skill for the purpose of this argument. The argument isn't that never retrying is the best way to improve at farming pp. It's that excessive retrying leads to overfitting. People just have a tendency to parrot the overly exaggerated trivially incorrect version of the argument.

Your results primarily indicate that retrying/focusing on short pp rewarding maps will increase the rate that you improve on pp farm maps. No shit sherlock, that much should be obvious without any data mining required.

On the flip side, there is no way to measure a player's ability to play every sort of map (a skill typically beneficial in tournaments) and not just a narrow subset of them that provides the highest pp returns on investment.

I'm not saying you're wrong, just that your data is not sufficient to support the strong conclusion you are trying to make.

I'm slow like a turtle. Plz no bully, I hide in my rock hard shell.

Modding Queue

NightNarumi

254 posts

Joined May 2014

NightNarumi 2017-12-16T11:42:36+00:00

It’s a bit unfortunate that we don’t have better variables to get a more detailed and precise analysis.
It was a good read though, thx :p

B1rd

2,974 posts

Joined December 2013

B1rd 2017-12-16T12:41:15+00:00

I don't suppose OP is well versed in the scientific method. If he was, he would know that the data doesn't come close to supporting the hypothesis that retrying doesn't hinder improvement, and he should just get rid of any implications that it does before his intellectual credibility goes down the drain.

Simon12

1,952 posts

Joined March 2018

Simon12 2019-11-02T12:34:33+00:00

Props to all the effort put into this, but I still think not retrying is a better way to improve at the game. Remember this: PP does not measure all types of skills in osu!

Molly Sandera

1,047 posts

Joined January 2020

Molly Sandera 2020-04-07T10:35:12+00:00

My two cents in this matter: i think that retrying ability should be used smartly. For example retrying a map 100 times just to get an proper fc is not a smart play. To retry a map 5 times to get down a certain pattern would be a better use. But perhaps the best use case is just where you know you should have done better. The first time you don't do very well for example environmental reasons, and then your next play is, indeed, a better play than your last. That would be a good use case for a retry. But indeed, the majority of your plays should be from new maps that are a little above your comfort zone

I'mma try my best on my favorite game~

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-09T23:28:07+00:00

I repeated the analysis with data extracted today.

One problem with defining the rate of improvement as pp/hitcount is that, as seen in the data, the expected amount of pp given a certain hitcount is not linear, but rather roughly a monotonically increasing quasi-convex function. This means that the pp/hitcount value tends to be higher when the player is higher ranked (which contradicts the hypothesis previously made that low-ranked players tend to show higher rates of pp/hitcount). The solution given by Railey2 is filtering to narrowed ranges of ranking (which works is the range is small enough to make the relationship between pp and hitcount roughly linear).

Instead, I define rate of improvement as the expected hitcount given the amount of pp of the player, divided by the hitcount of the player. E(hitcount|pp)/hitcount. This has the advantage of being dimensionless value that is invariant towards non-linear scaling transforms of the measurements of pp, the formula for rate of improvement can be applied to all ranges of pp rankings even if the relationship is not linear.

A problem is that it is dubious to consider obtaining pp the same as improving at the game. I'd rather call the variable "pp farming efficiency" instead of "rate of improvement".

For finding the expected hitcount given the amount of pp, I did an isotonic regression of the data, which is a non-parametric fit that only assumes that the relationship between the variables is non-decreasing.

Using that fitted curve to calculate rates of improvement, we obtain this graph:

With a linear fit with r = -0.116515. This shows there is a very slight negative linear correlation between the average length of each play of the player, and the efficiency the player obtains pp.

Last edited by Full Tablet 2020-04-09T23:42:58+00:00, edited 2 times in total.

Almost

2,154 posts

Joined May 2012

Almost 2020-04-10T01:20:37+00:00

Full Tablet wrote:
With a linear fit with r = -0.116515. This shows there is a very slight negative linear correlation between the average length of each play of the player, and the efficiency the player obtains pp.

No offence but you must be crazy to call that a correlation.

abraker

Global Moderator

8,295 posts

Joined July 2014

abraker 2020-04-10T04:34:33+00:00

Technically it's a correlation, albeit an awful one that certainly doesnt show any significant relation between the two things. To conclude something meaningful from it would be quite an extrapolation.

Last edited by abraker 2020-04-10T04:36:06+00:00, edited 2 times in total.

std skin 2021: link | mania skin 2021: (vanilla ver ~ hidden ver)
osu!Skills - Compare your skills in a slightly different way
OT!neus - osu off-topic subforum's very own discord server

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-10T04:42:04+00:00

Almost wrote:
Full Tablet wrote:
With a linear fit with r = -0.116515. This shows there is a very slight negative linear correlation between the average length of each play of the player, and the efficiency the player obtains pp.

No offence but you must be crazy to call that a correlation.

Considering the large sample size (n=10000) and r = -0.116515, then the variables are significantly correlated (i.e. the null hypothesis of the variables having no correlation whatsoever is rejected) at a significance level of 1%.

Example of variables not significantly correlated

With r=0.0054, which is not a significant correlation (which is not surprising, since the "rate of improvement" was defined in a way that makes it uncorrelated with pp)

Last edited by Full Tablet 2020-04-10T06:10:45+00:00, edited 2 times in total.

Almost

2,154 posts

Joined May 2012

Almost 2020-04-10T12:18:47+00:00

Full Tablet wrote:
Almost wrote:
Full Tablet wrote:
With a linear fit with r = -0.116515. This shows there is a very slight negative linear correlation between the average length of each play of the player, and the efficiency the player obtains pp.

No offence but you must be crazy to call that a correlation.

Considering the large sample size (n=10000) and r = -0.116515, then the variables are significantly correlated (i.e. the null hypothesis of the variables having no correlation whatsoever is rejected) at a significance level of 1%.

Example of variables not significantly correlated

With r=0.0054, which is not a significant correlation (which is not surprising, since the "rate of improvement" was defined in a way that makes it uncorrelated with pp)

The correlation is so poor that if you picked a random player with a certain average play length, you would literally have no idea what their rate of improvement would be because there's so much noise in the data. It really doesn't matter how much data you have, this data doesn't show anything meaningful.

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-10T18:12:50+00:00

Almost wrote:
The correlation is so poor that if you picked a random player with a certain average play length, you would literally have no idea what their rate of improvement would be because there's so much noise in the data. It really doesn't matter how much data you have, this data doesn't show anything meaningful.

https://en.wikipedia.org/wiki/Law_of_large_numbers
https://en.wikipedia.org/wiki/Insensitivity_to_sample_size
https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation

Given the data, we can conclude with confidence that there is indeed a negative correlation between the variables in the population (high-ranked osu! players), and that the correlation between them is not coincidental, even though, as you said, for each specific player we can't predict accurately their improvement rate given the average length of their plays.

Now, for concluding there is a causation (having low average play length increases the efficiency of obtaining pp), there are still other things to consider and to prove:

Prove That the relationship can not be explained by a third factor, that both decreases the average play length, and increases the improvement rate (or vice-versa).
Prove that being efficient at obtaining pp doesn't somehow decrease the average play length. (This proof is sufficient, but not necessary, since there is also the possibility of bidirectional causation).
In the case of generalizing to a bigger population (for example, all osu! players, instead of just high-ranked osu! players), one would need to prove there isn't something inherently different between the two groups (high-ranked and lower-ranked) that affects the correlation or causation. For this, it would be useful to repeat the analysis but with a random sample of all players instead of only taking the top 10k.
In the case of trying to prove that retrying mid-play is beneficial for obtaining pp, one would need to account for differences between lengths in different maps played by different players. As far as we know, it is also possible that players that play longer maps tend to get pp less efficiently, regardless if they retry or not.

Last edited by Full Tablet 2020-04-10T18:13:20+00:00, edited 1 time in total.

Almost

2,154 posts

Joined May 2012

Almost 2020-04-10T20:20:57+00:00

Full Tablet wrote:
Almost wrote:
The correlation is so poor that if you picked a random player with a certain average play length, you would literally have no idea what their rate of improvement would be because there's so much noise in the data. It really doesn't matter how much data you have, this data doesn't show anything meaningful.

https://en.wikipedia.org/wiki/Law_of_large_numbers
https://en.wikipedia.org/wiki/Insensitivity_to_sample_size
https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation

Given the data, we can conclude with confidence that there is indeed a negative correlation between the variables in the population (high-ranked osu! players), and that the correlation between them is not coincidental, even though, as you said, for each specific player we can't predict accurately their improvement rate given the average length of their plays.

Now, for concluding there is a causation (having low average play length increases the efficiency of obtaining pp), there are still other things to consider and to prove:
Prove That the relationship can not be explained by a third factor, that both decreases the average play length, and increases the improvement rate (or vice-versa).
Prove that being efficient at obtaining pp doesn't somehow decrease the average play length. (This proof is sufficient, but not necessary, since there is also the possibility of bidirectional causation).
In the case of generalizing to a bigger population (for example, all osu! players, instead of just high-ranked osu! players), one would need to prove there isn't something inherently different between the two groups (high-ranked and lower-ranked) that affects the correlation or causation. For this, it would be useful to repeat the analysis but with a random sample of all players instead of only taking the top 10k.
In the case of trying to prove that retrying mid-play is beneficial for obtaining pp, one would need to account for differences between lengths in different maps played by different players. As far as we know, it is also possible that players that play longer maps tend to get pp less efficiently, regardless if they retry or not.

Just to state it out here, I have not accused you of implying causation. I am simply pointing out that there really is no correlation here at all. To put it simply, you are extrapolating noise! You can't just look at the r number and then say there is or is not correlation. If you eyeball the data, it's plainly obvious that there's no correlation. The fact you need a computer to draw out a line for you is evidence of this.

Antiforte

140 posts

Joined July 2014

Antiforte 2020-04-10T20:40:05+00:00

I think we can all agree that there is no conclusion about skill to draw from pp, a flawed and constantly exploited skill metric (if it even counts as one).

Last edited by Antiforte 2020-04-10T20:40:39+00:00, edited 1 time in total.

What I'm pushing for:
● Score submission queueing improvement
● Players recommending maps in multiplayer
● Nothing much else. :c

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-10T22:58:54+00:00

Almost wrote:
Just to state it out here, I have not accused you of implying causation. I am simply pointing out that there really is no correlation here at all. To put it simply, you are extrapolating noise! You can't just look at the r number and then say there is or is not correlation. If you eyeball the data, it's plainly obvious that there's no correlation. The fact you need a computer to draw out a line for you is evidence of this.

If the data were totally uncorrelated, then the probability of obtaining a value of r=-0.116515 or less, with a sample size of n=10000, would be very low (less than 0.001%). The null hypothesis of there being no correlation in the population is rejected. This doesn't mean that the correlation is strong (in fact, it is very weak), it just means that the correlation exists.

You are probably thinking about the possibility of there being measurement errors in the variables by noise. If the measurement noise in the samples is big enough and correlated, this can lead to finding spurious correlations that do not represent the true trends in the population.

Possible sources of measurement errors in the data are:

Cheaters, who aren't representative of the population we are interested in. Cheating leads to a fake higher improvement rate over not cheating, but it is reasonable to assume that it also leads to a higher average play length (due to not needing to retry beatmaps to get good scores). Thus, cheaters actually make the apparent negative correlation weaker than the real correlation.
People who selectively only submit plays that give them pp, giving them a fake low hitcount and playcount value, and thus a fake high improvement rate. Similar to the previous case, this also leads to higher average play length due to retries not being counted in the data.
People who play offline, or play unranked beatmaps. Similar to the previous case, this behavior also gives a fake high improvement rate (but not as much as the previous case), but there shouldn't be a correlation with average play length (or, maybe, there is a positive correlation, since playing unranked/offline might be correlated with not caring about obtaining pp, and that attitude is in turn correlated with retrying less or playing longer maps).
People who multi-account, or that have had scores deleted. This leads to a higher measured improvement rate, but shouldn't affect the average play length.
Plays that have been randomly lost due to connection problems. This causes uncorrelated and random noise in the measurements.
Other sources of noise I haven't thought of.

Considering the sources of noise I mentioned, it is actually reasonable to assume that the real correlation is stronger than the measured correlation, unless I am missing some important source of measurement errors, or I am assuming wrong about the correlations in the sources of noise.

Last edited by Full Tablet 2020-04-10T23:15:24+00:00, edited 4 times in total.

Almost

2,154 posts

Joined May 2012

Almost 2020-04-11T00:11:23+00:00

Full Tablet wrote:
If the data were totally uncorrelated, then the probability of obtaining a value of r=-0.116515 or less, with a sample size of n=10000, would be very low (less than 0.001%). The null hypothesis of there being no correlation in the population is rejected. This doesn't mean that the correlation is strong (in fact, it is very weak), it just means that the correlation exists.

You are probably thinking about the possibility of there being measurement errors in the variables by noise. If the measurement noise in the samples is big enough and correlated, this can lead to finding spurious correlations that do not represent the true trends in the population.

Possible sources of measurement errors in the data are:
Cheaters, who aren't representative of the population we are interested in. Cheating leads to a fake higher improvement rate over not cheating, but it is reasonable to assume that it also leads to a higher average play length (due to not needing to retry beatmaps to get good scores). Thus, cheaters actually make the apparent negative correlation weaker than the real correlation.
People who selectively only submit plays that give them pp, giving them a fake low hitcount and playcount value, and thus a fake high improvement rate. Similar to the previous case, this also leads to higher average play length due to retries not being counted in the data.
People who play offline, or play unranked beatmaps. Similar to the previous case, this behavior also gives a fake high improvement rate (but not as much as the previous case), but there shouldn't be a correlation with average play length (or, maybe, there is a positive correlation, since playing unranked/offline might be correlated with not caring about obtaining pp, and that attitude is in turn correlated with retrying less or playing longer maps).
People who multi-account, or that have had scores deleted. This leads to a higher measured improvement rate, but shouldn't affect the average play length.
Plays that have been randomly lost due to connection problems. This causes uncorrelated and random noise in the measurements.
Other sources of noise I haven't thought of.

Considering the sources of noise I mentioned, it is actually reasonable to assume that the real correlation is stronger than the measured correlation, unless I am missing some important source of measurement errors, or I am assuming wrong about the correlations in the sources of noise.

The probability of such a correlation occurring is completely irrelevant. You could get 2 random data sets that happen to spuriously correlate and find they have a 0.001% of not being correlated statistically. Therefore, moot point.

I also find engaging in hypothesizing what the 'true' correlation to be rather pointless as it's realistically impossible to gain this data. Yes, you could potentially find a real negative correlation but at the same time, you could find the opposite to be true. This analysis provides really nothing meaningful and nobody should really be wasting their time thinking too much about it.

Antiforte wrote:
I think we can all agree that there is no conclusion about skill to draw from pp, a flawed and constantly exploited skill metric (if it even counts as one).

There is no such thing as a perfect skill measuring system. It's impossible to satisfy everyone. Is pp flawed and exploited? Yes but any system you cook up will be the same. I still think pp is somewhat relevant since it guides the skill acquiring habits of the majority of the community.

Last edited by Almost 2020-04-11T00:14:35+00:00, edited 2 times in total.

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-11T03:33:42+00:00

Almost wrote:
The probability of such a correlation occurring is completely irrelevant. You could get 2 random data sets that happen to spuriously correlate and find they have a 0.001% of not being correlated statistically. Therefore, moot point.

I also find engaging in hypothesizing what the 'true' correlation to be rather pointless as it's realistically impossible to gain this data. Yes, you could potentially find a real negative correlation but at the same time, you could find the opposite to be true. This analysis provides really nothing meaningful and nobody should really be wasting their time thinking too much about it.

https://en.wikipedia.org/wiki/Statistical_significance
https://en.wikipedia.org/wiki/Effect_size
https://en.wikipedia.org/wiki/Misuse_of_p-values

If you measure in a sample that there is a negative correlation, and the probability of not measuring that there is a negative correlation in case there is no negative correlation is 99.999%, the most reasonable conclusion is that there is indeed a negative correlation. If you only accepted empirical conclusions that have 100% certainty, you wouldn't be able to conclude anything, you wouldn't even conclude that gravity exists or that the Sun shines.

You are confusing statistical significance with effect size or magnitude of the effect. An example of high statistical significance with low effect size: a statistical study finds, with high confidence, that, obese people, eating a certain plant at least weekly, lose 0.1±2.0 more kilograms of weight each month (compared to not eating the plant at least weekly).

While the results might appear to be worthless (after all, the average effect is so small, and the variance of the outcome is so high, it is not worth it to influence the variable hoping to change the outcome), it does tell us some important things.

First of all, it tells us that it is not reasonable to expect the opposite effect of what was seen (that increasing the average play length increases the efficiency of getting pp). Also, the low value of of r tells us that, while in average we should expect a negative effect in pp farming efficiency when increasing the average play length, we can't predict the precise outcome with confidence.

Last edited by Full Tablet 2020-04-11T04:13:35+00:00, edited 1 time in total.

Almost

2,154 posts

Joined May 2012

Almost 2020-04-11T12:26:17+00:00

Full Tablet wrote:
https://en.wikipedia.org/wiki/Statistical_significance
https://en.wikipedia.org/wiki/Effect_size
https://en.wikipedia.org/wiki/Misuse_of_p-values

If you measure in a sample that there is a negative correlation, and the probability of not measuring that there is a negative correlation in case there is no negative correlation is 99.999%, the most reasonable conclusion is that there is indeed a negative correlation. If you only accepted empirical conclusions that have 100% certainty, you wouldn't be able to conclude anything, you wouldn't even conclude that gravity exists or that the Sun shines.

You are confusing statistical significance with effect size or magnitude of the effect. An example of high statistical significance with low effect size: a statistical study finds, with high confidence, that, obese people, eating a certain plant at least weekly, lose 0.1±2.0 more kilograms of weight each month (compared to not eating the plant at least weekly).

While the results might appear to be worthless (after all, the average effect is so small, and the variance of the outcome is so high, it is not worth it to influence the variable hoping to change the outcome), it does tell us some important things.

First of all, it tells us that it is not reasonable to expect the opposite effect of what was seen (that increasing the average play length increases the efficiency of getting pp). Also, the low value of of r tells us that, while in average we should expect a negative effect in pp farming efficiency when increasing the average play length, we can't predict the precise outcome with confidence.

If anything, you're the one confusing the value of the p value. Firstly, you can use a simple p value correlation calculator (like this one) and find that you only really need the r value as well as the number of samples to calculate the p value. Therefore, what dictates the p value is simply the number of samples; with a r score further away from 0 requiring a higher number of samples to give a lower p value.

Again, you cannot simply just look at the numbers only as the numbers only tell you half the story. If you were to just look at the numbers alone, it's evident that a correlation exists. However, intuitively looking at the data itself, it is clearly evident that a real correlation does not exist for practical purposes.

Also, any conclusions you might draw from this data will be seriously flawed due to all the possible sources of measurement errors in the data that you mentioned earlier. This means that drawing any conclusions at all is quite dangerous.

Last edited by Almost 2020-04-11T13:08:53+00:00, edited 1 time in total.

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-11T15:16:23+00:00

Almost wrote:
If anything, you're the one confusing the value of the p value. Firstly, you can use a simple p value correlation calculator (like this one) and find that you only really need the r value as well as the number of samples to calculate the p value. Therefore, what dictates the p value is simply the number of samples; with a r score further away from 0 requiring a higher number of samples to give a lower p value.

Again, you cannot simply just look at the numbers only as the numbers only tell you half the story. If you were to just look at the numbers alone, it's evident that a correlation exists. However, intuitively looking at the data itself, it is clearly evident that a real correlation does not exist for practical purposes.

Also, any conclusions you might draw from this data will be seriously flawed due to all the possible sources of measurement errors in the data that you mentioned earlier. This means that drawing any conclusions at all is quite dangerous.

A r score further away from 0 requires a lower amount of samples to obtain a certain p value, not a higher amount of samples.

Looking at the scatter plot is more useful when the sample is smaller. Intuition generally fails when the amount of data is more than what one can visually process. In particular, by looking at the graph, we can correctly infer that the correlation is not high, but that doesn't imply that the conclusions you draw from the data aren't relevant. The correlation is not to be ignored if the underlying implications of such a weak correlation make sense to be reported to a research community. Any r > 0.1 on a large data set is always something to look into.

The attitude of rejecting conclusions because they aren't visually intuitive can and does lead people to prefer smaller sample sizes over bigger sample sizes for drawing conclusions from, which is clearly a mistake.

https://en.wikipedia.org/wiki/Correction_for_attenuation

As I said earlier, if we consider the sources of the error, it is actually more likely that the real correlation is higher than what was found in the analysis, than the other way around. It takes particular kinds of measurement errors to cause spurious correlations.

Last edited by Full Tablet 2020-04-11T15:36:25+00:00, edited 1 time in total.

Almost

2,154 posts

Joined May 2012

Almost 2020-04-11T15:48:10+00:00

Full Tablet wrote:
A r score further away from 0 requires a lower amount of samples to obtain a certain p value, not a higher amount of samples.

Whoops, yes my bad here.

Full Tablet wrote:
Looking at the scatter plot is more useful when the sample is smaller. Intuition generally fails when the amount of data is more than what one can visually process. In particular, by looking at the graph, we can correctly infer that the correlation is not high, but that doesn't imply that the conclusions you draw from the data aren't relevant. The correlation is not to be ignored if the underlying implications of such a weak correlation make sense to be reported to a research community. Any r > 0.1 on a large data set is always something to look into.

https://en.wikipedia.org/wiki/Correction_for_attenuation

As I said earlier, if we consider the sources of the error, it is actually more likely that the real correlation is higher than what was found in the analysis, than the other way around. It takes particular kinds of measurement errors to cause spurious correlations.

Compare the plot you provided with these 3 plots.

Visually you can see it's more similar to the 3rd no correlation plot than to either of the positive or negative correlation plots.

Again, just because you can mathematically calculate an r value greater than or less than 0.1 with a significant p value doesn't mean it's 'real' correlation at all. Again, the p value is predicated more on the sample size than anything else so as long as you have enough samples you can get a significant p value no matter how erroneous everything else is. Anyway, a correlation of -0.11 is already a super weak correlation and paired up with the fact that the data has an enormous amount of variance as well as the poor quality of data allows us to conclude that nothing of meaning can really be drawn from this analysis.

Now to explain what we see further, you can see that the vast bulk of the play lengths of the players in the data set tend to be around the 150-250 range with far fewer players on the higher end compared to the lower end of the range. Also, the rate of improvement is also more clumped towards the lower end of the range (around 0.5-1.5) no matter what average play length you pick. The line of best fit in the plot you gave would clearly skew more towards the negative side of things due to the higher representation of the variance in the lower average play length side of the plot. You can't really extrapolate that this correlation would hold if you were to have more players in the data set on the higher end of the average play length range as the variance is so huge on all ends of the spectrum which would likely end up just leading to a no correlation result.

Last edited by Almost 2020-04-11T15:55:33+00:00, edited 2 times in total.

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-11T18:06:20+00:00

Almost wrote:
Compare the plot you provided with these 3 plots.

Visually you can see it's more similar to the 3rd no correlation plot than to either of the positive or negative correlation plots.

Yes, it is true that the data is closer to no correlation than perfect correlation.

Almost wrote:
Again, just because you can mathematically calculate an r value greater than or less than 0.1 with a significant p value doesn't mean it's 'real' correlation at all. Again, the p value is predicated more on the sample size than anything else so as long as you have enough samples you can get a significant p value no matter how erroneous everything else is. Anyway, a correlation of -0.11 is already a super weak correlation and paired up with the fact that the data has an enormous amount of variance as well as the poor quality of data allows us to conclude that nothing of meaning can really be drawn from this analysis.

If there is no correlation, it is almost impossible to obtain a value of r<=-0.11 with a large sample. When there is no correlation, as you increase the sample size, |r| tends to approach 0, so it is not true that as you increase the sample size you are more likely to find a statistically significant correlations when there are actually none.

Almost wrote:
Now to explain what we see further, you can see that the vast bulk of the play lengths of the players in the data set tend to be around the 150-250 range with far fewer players on the higher end compared to the lower end of the range. Also, the rate of improvement is also more clumped towards the lower end of the range (around 0.5-1.5) no matter what average play length you pick. [...] You can't really extrapolate that this correlation would hold if you were to have more players in the data set on the higher end of the average play length range as the variance is so huge on all ends of the spectrum which would likely end up just leading to a no correlation result.

Yes, due to not having much data outside the 100-250 play-length, we can't infer much about the effect of play length on improvement rate outside that play-length range (but still, we do have information about the effect inside the range).

If we sampled again with more data outside that range, and found that the correlation is weaker across all the range, then we would need to check for non-linear relationships between the variables. For example, it is possible that there is a bitter spot in play-length (for small play-lengths, increases of play-length tends to decrease improvement rate, and for big play-lengths, increases of play-length tends to increase improvement rate).

Almost wrote:
The line of best fit in the plot you gave would clearly skew more towards the negative side of things due to the higher representation of the variance in the lower average play length side of the plot.

I am not sure I understand what you are trying to say here.

Almost

2,154 posts

Joined May 2012

Almost 2020-04-11T19:49:24+00:00

Full Tablet wrote:
If there is no correlation, it is almost impossible to obtain a value of r<=-0.11 with a large sample. When there is no correlation, as you increase the sample size, |r| tends to approach 0, so it is not true that as you increase the sample size you are more likely to find a statistically significant correlations when there are actually none.

As I said mathematically, yes there is a correlation but the issue is whether such a correlation is practically relevant. The amount of variance in the plot says no. Essentially, this correlation is purely fit to noise.

Full Tablet wrote:
Yes, due to not having much data outside the 100-250 play-length, we can't infer much about the effect of play length on improvement rate outside that play-length range (but still, we do have information about the effect inside the range).

And you're just going to infer that the correlation is the same?? Not that it matters though since there's just way too much variance in the data.

Full Tablet wrote:
If we sampled again with more data outside that range, and found that the correlation is weaker across all the range, then we would need to check for non-linear relationships between the variables. For example, it is possible that there is a bitter spot in play-length (for small play-lengths, increases of play-length tends to decrease improvement rate, and for big play-lengths, increases of play-length tends to increase improvement rate).

Again, there's really no point in pontifiyng things outside the realm of possibility. It's realistically impossible to gather this data that's actually very clean.

Full Tablet wrote:
I am not sure I understand what you are trying to say here.

Correlation coefficient calculation is based off the least squares method which basically finds the line of best fit for a given set of data points. Since the distribution of this data is skewed towards the lower improvement ranges for basically all play lengths and there also being lack of data for the higher play lengths, there is a lower representation of the higher rates of improvement in the high play lengths. Simply put, since the distribution of the data is like below, if you were to have more data for the higher play lengths, you'd likely just get a uniform plot without any real change in the rate of improvement vs average play length since the distribution is pretty linear no matter the play length.

Last edited by Almost 2020-04-11T20:03:15+00:00, edited 3 times in total.

Endaris

5,282 posts

Joined June 2010

Endaris 2020-04-11T20:08:13+00:00

Railey would rejoice if he knew that his drama bait thread is thriving once again.

abraker

Global Moderator

8,295 posts

Joined July 2014

abraker 2020-04-12T00:44:11+00:00

Full Tablet, can you post density scatter plots with contours for the results?

Last edited by abraker 2020-04-12T00:47:44+00:00, edited 4 times in total.

std skin 2021: link | mania skin 2021: (vanilla ver ~ hidden ver)
osu!Skills - Compare your skills in a slightly different way
OT!neus - osu off-topic subforum's very own discord server

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2020-04-12T01:04:22+00:00

Almost wrote:
Full Tablet wrote:
If there is no correlation, it is almost impossible to obtain a value of r<=-0.11 with a large sample. When there is no correlation, as you increase the sample size, |r| tends to approach 0, so it is not true that as you increase the sample size you are more likely to find a statistically significant correlations when there are actually none.

As I said mathematically, yes there is a correlation but the issue is whether such a correlation is practically relevant. The amount of variance in the plot says no. Essentially, this correlation is purely fit to noise.

You are just repeating your points here. As I said earlier, there are conclusions that are practically relevant even if the correlation found is weak. For example, we can conclude that the correlation is not null nor positive (so it is not reasonable to increase your average playcount expecting to get pp more efficiently, and "never retry" is likely bad advice if one seeks to gain pp efficiently), we can also conclude that the correlation is weak, so we can't confidently predict the outcome based on the variables (in other words, "Your mileage might vary").

You seem to be confusing measurement noise with inherent noise in the data; measurement noise does weaken some inferences you might make from the data (but not all inferences, such as the existence of correlation, except when the noise itself is correlated), inherent noise (noise we would obtain even if the measurements are perfect) does not make the inferences less reliable, it just makes predictions less reliable.

Almost wrote:
Full Tablet wrote:
Yes, due to not having much data outside the 100-250 play-length, we can't infer much about the effect of play length on improvement rate outside that play-length range (but still, we do have information about the effect inside the range).

And you're just going to infer that the correlation is the same?? Not that it matters though since there's just way too much variance in the data.

I am not inferring that the correlation is the same outside the points we have more data on. I am saying that we can't infer as confidently outside the range as we would do inside the range.

Almost wrote:
Full Tablet wrote:
If we sampled again with more data outside that range, and found that the correlation is weaker across all the range, then we would need to check for non-linear relationships between the variables. For example, it is possible that there is a bitter spot in play-length (for small play-lengths, increases of play-length tends to decrease improvement rate, and for big play-lengths, increases of play-length tends to increase improvement rate).

Again, there's really no point in pontifiyng things outside the realm of possibility. It's realistically impossible to gather this data that's actually very clean.

Acknowledging what we don't know is also important. If we aren't able to know about something, we should take into consideration all possibilities about it (which is not the same as assuming that one of the possibilities is true, which would be something obviously wrong to do).

Almost wrote:
Correlation coefficient calculation is based off the least squares method which basically finds the line of best fit for a given set of data points. Since the distribution of this data is skewed towards the lower improvement ranges for basically all play lengths and there also being lack of data for the higher play lengths, there is a lower representation of the higher rates of improvement in the high play lengths. Simply put, since the distribution of the data is like below, if you were to have more data for the higher play lengths, you'd likely just get a uniform plot without any real change in the rate of improvement vs average play length since the distribution is pretty linear no matter the play length.

Do you think that the sampling process was skewed towards selecting points that have low improvement ranges? The fact that the data obtained has a skewed distribution doesn't mean that the sample is biased, as far as we know, that skew seems to be a characteristic of the population.

Also, if the distribution of the independent variable in the data is skewed towards lower values, it is not true that you should expect getting a false negative slope in the linear regression of the data.

Example:
Suppose that in the population, there is an inherent randomness in the value obtained in the dependent variable for each point in the independent variable, with the randomness following a skewed distribution.

(The closer the point is to white, the more probable is to obtain that value). The best possible prediction of y given x is a linear function y(x) = 0.01x, and the true correlation between the two variables is r=0.115216.

Now, we do a sample that is very skewed towards taking points with low values of x:

This sample has a linear regression of y(x) = 0.0000122334 + 0.0100465x, and r=0.115216, which is pretty close to the ideal parameters. If we repeat this skewed sampling several times we obtain similar results. There is no tendency to find slopes when we shouldn't with the skewed sampling.

Another example:
Suppose that the population follows this distribution for each x:

Which has an ideal prediction function y(x)=0.01x^2, and r^2=0.258148. Attempting a linear regression with a sample of this population would fail with r very close to 0.

Now let's take a very skewed sample:

A linear regression gives y(x)=0.33314 -6.47732*10^-6 x, with r=-0.0000643552 (so it fails).
A quadratic regression gives y(x)=0.00303735 -6.47732*10^-6 x+0.00990298 x^2, with r^2=0.258148. So even a sample this skewed reproduces the non-linearity of the population.

Abraker wrote:
Full Tablet, can you post density scatter plots with contours for the results?

Non-linear tranform on the density values to show more contours

Here it is (using a Gaussian kernel).

I also included a high-degree (7) polynomial regression curve, which has r^2=0.0225744 (|r|=0.150248, better than the one in the linear model with r = -0.116515), which is significant at a significance level of 1%. This curve is better for predicting rate of improvement given the average hitcount in the white area of the graph (where we have the most information), but high-degree curves are worse at extrapolating compared to linear curves, and the low value of r^2 indicates that the predictions are still not reliable.

Doing a F-test to check whether or not the high-degree polynomial model fits the data significantly better than the linear model (null hypothesis: high-degree polynomial model does not provide a significantly better fit than the linear model), we conclude that the polynomial model is significantly better with a false-rejection probability less than 1%.

Last edited by Full Tablet 2020-04-12T06:50:51+00:00, edited 8 times in total.

Sign In To Proceed

Don't have an account?

The reason why "never retry" is (probably) bad advice

Rayne wrote:

Pawnables wrote:

ManuelOsuPlayer wrote:

pandaBee wrote:

N0thingSpecial wrote:

N0thingSpecial wrote:

pandaBee wrote:

Railey2 wrote:

Endaris wrote:

Full Tablet wrote:

Almost wrote:

Full Tablet wrote:

Full Tablet wrote:

Almost wrote:

Full Tablet wrote:

Almost wrote:

Full Tablet wrote:

Almost wrote:

Almost wrote:

Full Tablet wrote:

Antiforte wrote:

Almost wrote:

Full Tablet wrote:

Almost wrote:

Full Tablet wrote:

Full Tablet wrote:

Almost wrote:

Almost wrote:

Almost wrote:

Almost wrote:

Full Tablet wrote:

Full Tablet wrote:

Full Tablet wrote:

Full Tablet wrote:

Almost wrote:

Full Tablet wrote:

Almost wrote:

Full Tablet wrote:

Almost wrote:

Full Tablet wrote:

Almost wrote:

Abraker wrote:

New reply