Acc pp

T3rza

Joined February 2017

Topic Starter

T3rza 2018-04-08T21:23:20+00:00

I had a lot of time on my hands so I decided to see if I could derive a system for calculating accuracy pp because the current system seems a bit arbitrary. (It almost seems like an equation you pulled out of your ass and modified through trial and error).

Accuracy is simply how accurately you hit the notes. This is one of the few things that can be calculated somewhat objectivity. This is essentially a statistical question, so consequently, statistical methods can apply really easily to this. A measurement of accuracy is similar to a measure of error. In fact, I would argue that they are negatively proportional. The more error there is, the less accuracy you have. This means that if we can find the error, we can find the accuracy. A good measurement of error is variance. Based on the misses, 50s, 100s, and 300s of circles (since sliders are automatic 300s and spinners don’t require hitting) and the OD, we can calculate the maximum variance of the hits from 0. We can compare the square root of this to the 100% accuracy at certain ODs to find an effective OD.
Variance (V) = [(#circles-#100s-#50s-#misses)*MS300^2 +#100s*MS100^2+#50s*MS50^2+#misses*MSmiss^2]/#circles
Key:
MS300 = maximum error (in milliseconds) from zero where you still hit a 300 (79.5-6*OD)
MS100 = maximum error (in milliseconds) from zero where you still hit a 100 (139.5-8*OD)
MS50 = maximum error (in milliseconds) from zero where you still hit a 50 (199.5-10*OD)
MSmiss = Value if you miss (in milliseconds) (259.5-12*OD)
For the milliseconds of the misses, I extended the already present pattern that separated 300s from 100s and 100s from 50s.

For ease of calculation and to simplify, I will assign some values:
p1 = #100s/#circles
p2 = #50s/#circles
p3 = #misses/#circles
V = (1-p1-p2-p3)*(MS300^2)+p1*(MS100^2)+p2*(MS50^2)+p3*(MSmiss^2)
This looks much easier to calculate.
Note that both of these use information already present in ppcalc program (number of 100s 50s and misses)

From here, we can calculate an effective OD.
sqrt(V) = 79.5-6*OD_effective
OD_effective = 13.25 - sqrt(V)/6

You can stop here and apply the effective OD into the acc pp equation and get rid of the (acc)^24 that weighs acc if you want.
1.52163^(OD_effective)*(n/1000)^0.3
I used Lagrange multipliers on random samples of maps and discovered that Tom’s OD to acc_pp algorithm (1.52163^OD) is not that far off. The exponential function was almost always the most accurate (and the values I got were 1.5-1.6, very similar to Tom’s pp system).

However, I decided to take it a step further, b/c why not.
It is much harder to acc a 5 minute map than a 1 minute map; because of this,a length bonus is in order. I belief standard error is an adequate metric of this. The more circles that are present, the harder it is to acc (this is where consistency comes in) and the higher the OD, the harder it is to acc said circles. The standard error would be the square root of the variance (it’s not a standard deviation because the data isn’t a continuous distribution) over the square root of amount of circles. The length bonus would be somewhat proportional to the inverse of the standard error. Additionally, this is ideal because as n grows larger, the amount of bonus you get per circle should decrease. The difference between 100 and 200 circles is a lot more impressive than the difference between 900 and 1000 circles.
I did this in a simple way. I replaced the (n/1000)^0.3 with (Standard Error)^(-0.6).
This will look like (sqrt(n)/sqrt(V))^0.6 = (sqrt(n)/(79.5-6*OD_effective))^0.6 = (n^0.3)/(79.5-6*OD_effective)^0.6

To make it somewhat similar, I did a regression to the original model and this is the model I got:
8.9293*1.3135^(OD_effective) * (sqrt(n)/(79.5-6*OD_effective))^0.6

This is more of a proof of concept than anything, hope you like it.
I might make a more intricate solution in the future (if I'm bored enough lol).

abraker

Global Moderator

8,295 posts

Joined July 2014

abraker 2018-04-08T23:53:58+00:00

Looks like a text dump and is hard to read. Consider using LaTeX. Make formulas in google docs and take screenshots to post on here using shareX

Tr3sLeches wrote:
You can stop here and apply the effective OD into the acc pp equation and get rid of the (acc)^24 that weighs acc if you want.
1.52163^(OD_effective)*(n/1000)^0.3

You should explain what n is as it seems to come from nowhere

It's also worth explaining the behavior of the formula you come up visually because very little amount of people will bother to understand what a bunch of numbers and symbols do.

https://www.desmos.com/calculator/li88lywttn

std skin 2021: link | mania skin 2021: (vanilla ver ~ hidden ver)
osu!Skills - Compare your skills in a slightly different way
OT!neus - osu off-topic subforum's very own discord server

T3rza

42 posts

Joined February 2017

Topic Starter

T3rza 2018-04-09T00:43:55+00:00

I actually did write this in desmos originally, I didn't think to put the link in the thread:https://www.desmos.com/calculator/yav4kwkclq

n is the amount of circles in the map. I actually typed all of it lol.
Hopefully the desmos makes it clearer.

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2018-04-09T04:28:12+00:00

Your function:

Takes the distribution of the judgments in a play (the proportions of 300s, 100s, 50s and misses), and the OD (which affects the timing windows), and calculates some kind of estimation of the "variance" of the hit errors in the play.
Applies a monotonic non-linear scaling function to that variance.
Then it multiplies the value obtained by a function that is monotonic to the standard error based on the estimation of variance.

The main problem I see is that the scaling functions balance the effect of map length and the distribution of judgments arbitrarily. It is hard to justify in your model why the functions are what they are.

I think it is better if we make acc pp be a monotonic function of the acc% of a play, since that way there are no cases where bigger acc% (the metric players use to determine accuracy in a play) gives smaller acc pp.

Another alternative is having a model of the distribution of acc% based on the variance of hit errors, the number of circles, and OD (hit error windows), then estimate the variance given acc%.

The distribution of hit errors in a play follows closely a Normal Distribution (we will take it as a distribution with mean error 0, since standard deviation plays a much bigger role in accuracy for most players).

That means that the probability of getting each judgment is (Erf is the error function https://en.wikipedia.org/wiki/Error_function, "s" is the standard deviation of the errors):

So, in a play, the distribution of the judgments follows a multinomial distribution with total n equal to the number of circles, and the probability of the result of each trial follows the previous probabilities.

With large amount of circles, the previous distribution approximates well to a multinormal distribution with the same mean vector and same covariance matrix. (This approximation is done since with the next calculations, using a multinomial distribution is unwieldy).

So, the acc% of a given play follows a normal distribution with mean and standard deviation equal to:

With this, we can find the value of "s" (standard deviation of the hit errors) in which less than acc% is obtained with p probability. For the estimation of worth of a play, it is better to use a pessimistic estimation (p higher than 50%), so players don't get rewarded by getting high acc% values that are likely to be just flukes.

For p=99%, the value of "s" that fits certain acc% is the positive value that solves the equation:

(This equation doesn't have an easy closed-form solution, unfortunately, but can be solved numerically easily)

That way, the value of "s" found gives a measure of how accurate a play is (the smaller it is, the more accurate is the play). That value then could be scaled arbitrarily to give acc pp.

Examples of values of "s" given different parameters:

Last edited by Full Tablet 2018-04-09T05:03:11+00:00, edited 3 times in total.

T3rza

42 posts

Joined February 2017

Topic Starter

T3rza 2018-04-09T08:15:27+00:00

I had an entire reply typed out but my computer deleted it so I'll just sum it up.

The main problem I see is that the scaling functions balance the effect of map length and the distribution of judgments arbitrarily. It is hard to justify in your model why the functions are what they are.

You are right about that. The scaling function is a little arbitrary (I chose it for the sole purpose of creating a regression to the original after all). I tried using standard error alone to balance the effect of map length and OD. I generated a PDE from it and solved it, but it weighed length bonus way to heavily.
I could make it work if I could figure out a way to relate length bonus and effective OD.

While ideally, hits are normally distributed, most times they are not. This results in you having 4 equations for the probabilities and one unknown. If you solve each equation, you will get a different s.
Even if you find the standard error using the method below that, it will be difficult to implement your method of finding the standard error into non-math programs like Matlab and the like. Not only that, it will also be pretty computationally expensive. I think you need to find an closed-form approximation of the standard deviation (which is probably difficult to do). That will make it much easier on a computer.

Overall, this is very cool! I am not taking any math courses this semester so I need to wrack my brain like this every once in a while.

abraker

Global Moderator

8,295 posts

Joined July 2014

abraker 2018-04-09T09:11:48+00:00

By the way, the error function can be approximated using:

where n = 1.1283781
The worst case would be at x = 1.225, with a 0.0353 difference

worst case was determined by taking the difference between the actual function and the modeled function in demos
n was found by running that difference through the derivative and picking values where d/dx at x = 0 is 0, indicating that the slopes of actual and model at x=0 are close to same

std skin 2021: link | mania skin 2021: (vanilla ver ~ hidden ver)
osu!Skills - Compare your skills in a slightly different way
OT!neus - osu off-topic subforum's very own discord server

T3rza

42 posts

Joined February 2017

Topic Starter

T3rza 2018-04-09T23:22:39+00:00

By the way, the error function can be approximated using:

Image

where n = 1.1283781
The worst case would be at x = 1.225, with a 0.0353 difference

worst case was determined by taking the difference between the actual function and the modeled function in demos
n was found by running that difference through the derivative and picking values where d/dx at x = 0 is 0, indicating that the slopes of actual and model at x=0 are close to same

Lol that doesn’t make it much easier. 0.0353 is a lot of error. At best, with this, you could expand the MS functions in the approximation to make an exponential polynomial and see what happens.

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2018-04-10T03:34:29+00:00

Last edited by Full Tablet 2018-04-10T04:01:54+00:00, edited 2 times in total.

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2018-04-15T00:04:47+00:00

Changing the variable to find from s -> 1/a

Using this approximation for the error function:

And using the fact the function to find a root for is concave in an interval that contains the root from 0 to b (where b is a value difficult to calculate, but that is not important in this case). One can use a simplified secant method (a very simple iterative algorithm) to find the root, starting with 2 points below the root. (This method is actually faster than other more advanced methods that use derivatives, from what I have tested).

With that method, it takes in average 63 microseconds to find a root in my laptop, so performance is not an issue.

Tr3sLeches wrote:
While ideally, hits are normally distributed, most times they are not. This results in you having 4 equations for the probabilities and one unknown. If you solve each equation, you will get a different s.

Since the error in a hit in a play is taken as a random variable, from a sample, there is no way to prove with 100% certainty if it follows a certain distribution (and also, you can't prove with certainty that it doesn't follow a certain distribution as long as there is no point that is outside the distribution´s support).

The normal distribution is chosen since it is a simple distribution that tends to resemble the distribution of hit errors in a play. One can improve the model by finding a distribution that matches a large sample better than the normal distribution (comparing, for example, the maximum likelihood of obtaining the given sample given each distribution). The formula you initially proposed assumes that each hit in the play is just in the border of it's judgment, as opposed to using other distribution.

Of course, we wouldn't need to estimate the distribution of hit errors in a play with a model, if the error of each hit was available for calculation. In that case we could just calculate the mean squared error of the hits directly.

T3rza

42 posts

Joined February 2017

Topic Starter

T3rza 2018-04-15T21:23:14+00:00

This allows for 4 50s to have almost the same worth as 5 100s when the variance and the standard deviation are greater in the former case.

You are right though, I did not post the vindication for the equation I chose in my original post.
My vindication for the equation I chose is as follows:
There are three important observations I made while figuring this out:
1. Error is the inverse of accuracy. The less the error, the greater the accuracy and vice versa.
Hence, when 2 errors are the same, their accuracy is also the same.
2. When calculating the maximum variance possible, if acc is 100%, the square root of the variance will always equal MS300.
Because of this, given the number of circles, 300s, 100s, 50, misses, you can calculate at what OD would you have the same error but have 100% acc. For example Cookiezi’s 99.83% HDHR (OD 10) Freedom Dive has the same error as if the map was SSed with 9.9592 OD. We should argue that when solely looking at OD, the pp from the SS and the 99.83 should be equal since there errors are equal (this is only 1 portion).
3. An exponential function of OD fit the pp curve resulted from star rating really well (with correlation coefficient (r^2) .99+ in all samples).
That means in order to balance acc pp with the rest of the system, it would be ideal to use an exponential of OD.

For the length, since error is the inverse of accuracy, less error results in an increase in accuracy. Also, the lesser the standard deviation, the lesser he error and the greater the accuracy. Because of these two points, I thought that standard error was a good metric to use.

With these in mind, I constructed a partial differential equation to solve for the accuracy pp in terms of OD and number of circles. The result was an arbitrary function of standard error multiplied by an exponential of OD. I just used the function Tom already had for length to model my arbitrary standard error function out of. I used an inverse power of the standard error. To put it together, the pp function was an exponential of OD multiplied by an inverse power of standard error (the power I chose was 0.6 because that was what was already present in Tom’s model). To find the constants, I just did a regression to the original pp model. These numbers are subject to change slightly if necessary.

I think it is good because it is easy to calculate, easy to set up and implement and punishes directly according to how far off you hit. A 50 will increase your standard deviation much more than a 100 would.
Also, you can replace the effective OD with 13.25 - s/6 where s is the square root of the variance. Vice versa, you can replace s with 79.5 - 6*OD_effective.

Your model:
If it takes 63 microseconds for 1 error function, how long will it take to solve an equation with multiple error functions and square roots of them (You would have to use Newton’s method or something at this point). (This might not be a big issue now that I’m thinking about it, implementation into osu code is the real problem).

For p=99%, the value of "s" that fits certain acc% is the positive value that solves the equation:
Image

Isn’t this the top end confidence interval? If s could be calculated in a timely manner, I think this could work. The remaining challenge would be writing a program to solve that equation in code (if only MATLAB could be used...)

Full Tablet

2,542 posts

Joined September 2011

Full Tablet 2018-04-16T00:00:32+00:00

In both methods, the accuracy skill of the player is characterized by the mean of the squared errors of the hits of a player (when the mean error is 0, this corresponds to the variance, so this will be called "variance" from now on).

Your method calculates the worst possible variance that is possible given a certain judgment count (except for misses, which are treated in a special way), then it compares it to the lowest OD that can be possibly SS'd with that variance. The problem is that the estimation of the variance doesn't depend on the beatmap's length, so that is factored into the equation in some arbitrary way, your differential equation might allow it being arbitrary, but the decision has an effect on the balance, so it must be justified in some way.

My method calculates s (square root of the variance), by finding at what s there is a 1% chance of getting the acc% of the play or more (assuming that the hit errors follow a normal distribution, misses are seen as "hitting anywhere outside the window for a 50 or better").

The fact that the chance is 1% (as opposed to 50%, which would make s be the standard deviation that makes the acc% of the play be the median acc%), makes it give a pessimistic estimation, and it is what makes the equation length-dependent (the more circles there are in the map, the less likely that a sample standard deviation is very far away from the population standard deviation).

This is analogous to this: Suppose you play a game where you need to throw your dice and get high numbers, for that, you use the most loaded die (the one that gives the highest numbers in average) you can find. You have a good die that you have thrown thousands of times, giving you a good average (let's say, it gives a value of 5.5 in average). Then, you find someone who is trying to sell you a very expensive die, claiming it is better than what you currently have, and to prove it, he throws it 3 times, giving 6 each time. Because of the cost of overestimating that die's performance are much higher than the costs of underestimating it, you estimate in a pessimistic way, so 3 throws is not enough to prove with certainty than what he has is actually better; if he throws it thousands of times more, and it still shows a better performance, then you can be more certain of the decision.

The reason I chose to fit the estimation of variance to acc% instead of using the judgment counts (which would give better estimates than acc%), is that this way a higher acc% always means a higher acc pp, and players look at the acc% to measure their accuracy.

You could follow a similar logic than the one I used for the previous formula, but fitting to the judgment counts instead of acc%, resulting in equations that are more expensive to solve.

I see 3 options:

Calculating based on judgment counts. This gives a medium quality estimate of the variance of hits, it's expensive to calculate. A higher acc% won't always mean a higher acc pp.
Calculating based on acc%. This gives a lower quality estimate, it's not very expensive to calculate. A higher acc% will always mean a higher acc pp.
Calculating based on hit error for each hit directly. This gives the best estimate and it's the fastest to calculate, but the data is not available for all plays and a higher acc% won't always mean a higher acc pp.

About the 63 microseconds to calculate, I was referring to using the secant method for solving the equation for a=1/s (the change of variable makes it faster to calculate), with double floating point precision (which can involve hundreds of error function evaluations), not the time for a single error function evaluation. The time was tested from writing the secant method from scratch in Mathematica (not using the prebuilt method), and then compiling it (Mathematica internally converts the code to C, using libraries for math functions and tensor manipulation, and then compiles it).

From what I have tested the secant method is faster than Newton's method for solving the equation in this case. Writing the secant method algorithm is easy, and the function evaluations do not use anything more advanced than the exponential function and the square root (which are standard functions).

T3rza

42 posts

Joined February 2017

Topic Starter

T3rza 2018-05-02T02:33:55+00:00

I have a break in my finals week, so I'm going to respond here. I decided to build your method to see what happened. I figured out a formula to use to find s using your model. I used the probability data we already had as input data. However, when I was testing it out I noticed somethings.
~~1. The input data has to include values from 300s, 100s and 50s or the algorithm does weird things (like increasing s when you replace misses with 100s).~~The estimate is weird no matter what.
2. This does not take number objects into consideration very well. In statistics, 400 to 1000 objects doesn't make too much of a difference. However in osu, it does (or at least it should). In this model, after about 300 objects or so, object count does not even matter.
3. I just tested this out some more and this model does weird stuff period (like giving 94% less error than 99%).
4. Since the normal curves are so different, this estimate is skewed pretty much no matter what you do.
5. The scale for misses is absurd.
6. Based on the probabilities we know, when I tested it out. It took 8 secant estimates in order to get something workable.
If we could solve the equation instead of approximate it, this would turn out somewhat well. With estimates, this model is all over the place.

I don't think I explained it clearly before but I will now. Reason 6 is the reason I suggested using Newton's method. In most cases, the guess for a at P300 is going to be MUCH closer to the actual a than the guesses at P100 or P50. Using the secant method throws off the estimate from close point and you have to redo the method a lot to get back there. For Newton's method, you could just start at the close point and just get closer from there.

Last edited by T3rza 2018-05-02T02:49:58+00:00, edited 1 time in total.

T3rza

42 posts

Joined February 2017

Topic Starter

T3rza 2018-07-23T14:39:59+00:00

I was looking through my thread posts and saw that I left this algorithm unfinished. I've decided that I am going to tackle this again (I'm going to post it in a new thread though b/c this one is old).

new thread: https://osu.ppy.sh/community/forums/topics/778750

I wonder if I can close this thread...

Last edited by T3rza 2018-07-23T15:22:19+00:00, edited 2 times in total.

Sign In To Proceed

Don't have an account?

Acc pp

Tr3sLeches wrote:

Tr3sLeches wrote:

New reply