Have been talking to other people about scorev2 and its issues. The main three issues that people have said are:
- Rainbows are weighted too little.
- HR on v2 is ridiculously difficult to get rainbows on (on Stable at least, on Cutting Edge it seems as easy as accumulating rainbows on nomod - but more on that later).
- LNs generate a lot of misses if they have really tricky releases. They work fine for other types of LNs.
All of those problems can definitely be amended.
Rainbow Accuracy
HardRock
LNs
I think that should be all, feel free to ask any questions if you're uncertain about a couple of things that I've pointed out/suggested.
- Rainbows are weighted too little.
- HR on v2 is ridiculously difficult to get rainbows on (on Stable at least, on Cutting Edge it seems as easy as accumulating rainbows on nomod - but more on that later).
- LNs generate a lot of misses if they have really tricky releases. They work fine for other types of LNs.
All of those problems can definitely be amended.
Rainbow Accuracy
SPOILER
I've been experimenting with weightages and discussing with people about how much a 200 should be worth compared to a 300. I initially thought that 310 would be fine (and a 200 would be worth 11 300s), but when it came to matches like this, if accuracy was the only factor, Argentina would win by 21,000 points. I do think that Argentina should win and it's a step in the right direction, but 21,000 seems extremely overwhelming since it undermines the fact that Poland had overall, noticeably less 200s. I tried it with harder charts too and they seem to favour rainbow accuracy a little too much for my liking - especially since when it comes to harder charts (where players struggle with), good rainbow accuracy is usually caused by variance rather than a higher skill level. 200s and worse judgements should determine performance for that.
I wanted to use 307 afterwards, but it still gave a bit too much emphasis for my liking, about 12,500 points for that Argentina/Poland match. I went down to 305, and the difference is about 6,800. I think that's ultimately the most reasonable assessment, and others I've talked to seem to agree with the prenotion that a 200 is about 21 normal 300s. Ignoring the bad judgements (since those values are pretty much set in stone at this point), this is probably (part of) the ideal solution. This does mean that only full rainbow scores are SSs, but I don't see that as a problem as frames of reference can be shifted.
Getting rid of the difference between a rainbow and a normal 300 in the combo scoring component is probably ideal too, since that should be in the accuracy component, not the combo component. If rainbows are included into accuracy, the combo component does not need a rainbow component.
I also wanted to soften the exponential curve a tiny bit when it comes to including rainbows, mainly because at a certain point extremely good accuracy is more caused by variance rather than a very high skill level - unless the performance is consistently done, which is not measurable with just one match and one attempt. The exponential I had in mind was Accuracy^(2 + 2 * Accuracy), but it's essentially Accuracy^4 - so 1 power down.
Similar note, wanted to respond to this:
tl;dr: Embed rainbows into accuracy with a weightage of 305 instead of 320, change the accuracy curve to Accuracy^(2 + Accuracy * 2), remove the differentiation between rainbows and normal 300s in the combo component (both of them should have a HitValue of 30).
I initially wanted to increase the rainbow judgement weightage without embedding rainbows into accuracy, but no matter how much I changed it, the difference is very minor (~600-1,200 points) and a 200 will almost always be too powerful compared to a rainbow 300. So I scrapped that idea and thought that embedding rainbows into accuracy with a reasonable weightage and maybe making the curve more lenient would be the best idea.Shoegazer wrote:
320s are very much underweighted because the only component of the scoring system that takes into account 320 accuracy is the combo component, which only has a 20% prominence. Add on to the fact that the difference between a 300 and 320 is so small and that the absolute difference between juan and Hudo's 320 count isn't that significant, it would make sense that 320s are really underweighted at the moment.
You could mitigate this by including 300gs into accuracy, but from what I've experimented it might create too much emphasis on MAX accuracy with charts that players have issues getting 96%+ on (and as a result would not be an accurate assessment of skill).Alternatively, you can avoid including MAXes in the accuracy component and just increase the importance of MAXes to like 360 to increase the emphasis of it by a noticeable but not overpowering amount in the combo component, but that requires a bit more experimentation.
I've been experimenting with weightages and discussing with people about how much a 200 should be worth compared to a 300. I initially thought that 310 would be fine (and a 200 would be worth 11 300s), but when it came to matches like this, if accuracy was the only factor, Argentina would win by 21,000 points. I do think that Argentina should win and it's a step in the right direction, but 21,000 seems extremely overwhelming since it undermines the fact that Poland had overall, noticeably less 200s. I tried it with harder charts too and they seem to favour rainbow accuracy a little too much for my liking - especially since when it comes to harder charts (where players struggle with), good rainbow accuracy is usually caused by variance rather than a higher skill level. 200s and worse judgements should determine performance for that.
I wanted to use 307 afterwards, but it still gave a bit too much emphasis for my liking, about 12,500 points for that Argentina/Poland match. I went down to 305, and the difference is about 6,800. I think that's ultimately the most reasonable assessment, and others I've talked to seem to agree with the prenotion that a 200 is about 21 normal 300s. Ignoring the bad judgements (since those values are pretty much set in stone at this point), this is probably (part of) the ideal solution. This does mean that only full rainbow scores are SSs, but I don't see that as a problem as frames of reference can be shifted.
Getting rid of the difference between a rainbow and a normal 300 in the combo scoring component is probably ideal too, since that should be in the accuracy component, not the combo component. If rainbows are included into accuracy, the combo component does not need a rainbow component.
I also wanted to soften the exponential curve a tiny bit when it comes to including rainbows, mainly because at a certain point extremely good accuracy is more caused by variance rather than a very high skill level - unless the performance is consistently done, which is not measurable with just one match and one attempt. The exponential I had in mind was Accuracy^(2 + 2 * Accuracy), but it's essentially Accuracy^4 - so 1 power down.
Similar note, wanted to respond to this:
A 66.67% score nets you about 306K (181K for accuracy, 125K for combo). Adding FL increases it to 324K, you might've miscalculated. In any case, I do agree with the fact that bad judgements (non-200/300 judgements) should be penalised more, but I don't think it's necessarily what they have in mind at the moment, since the values are carved in stone. MAX judgements are not.Drojoke wrote:
To give a slightly over the top example: scoring a 200 on every note in a song is going to give you an accuracy of 66.67% and a score of about ~335k. The score is really low because you barely get any bonus score. The same play would net you a somewhat respectable 733.3k in the current proposal. Add FL to this, and you get a score of 777.3k. In practice, it's going to be less extreme than this, but it's definitely present.
tl;dr: Embed rainbows into accuracy with a weightage of 305 instead of 320, change the accuracy curve to Accuracy^(2 + Accuracy * 2), remove the differentiation between rainbows and normal 300s in the combo component (both of them should have a HitValue of 30).
HardRock
SPOILER
Accumulating rainbows on HR on most charts is really strict already, but since the rainbow window is stricter in scorev2 in ODs beyond 8, it gets even stricter and probably way too difficult. On anything above OD7.5, the HR will be boosted to 10 - which means that it has a rainbow window of +-13ms. Add on the general effects of HR (which makes windows 40% tighter), it gets knocked down to 9ms, truncated. Considering that any hits within +-5ms is caused by computer performance variance, having only 4ms of "controlled" timing is very very low - especially since the difference between an 6ms controlled window (HRv1 window) and a 4ms controlled window is huge. While it is true that certain modes have windows this tight on HR (and maybe DT), 85% of charts used in MWC are OD8, whereas it's much less common in other game modes to have something this tight (in Standard it's some absurd DTHR with a decent OD, and in Taiko it's some absurdly high OD with DTHR, and I'd argue that timing on both games are easier than Mania).
There's also the fact that with the rainbow weightage aspect included, wins on FreeMod are very variance-based rather than performance-based.
My main suggestion is to keep the 40% tighter window (except for early miss judgements) on HR, but not increase OD with HR. 11ms (OD8-OD9 is standard), while I still think is noticeably harsh, is the norm and the 40% seems to fit the other judgement windows quite well anyway. In fact, v1 already does this, this is a OD5 chart on HR. While it appears as if it looks like OD7+HR, is actually aligned with OD5 with 40% tighter windows. Ignore the additional 0.5ms, as it's some byproduct of woc's janky coding.
But that's one thing. The other thing I noticed is that the difference between rainbow difficulty between scorev2 NoMod and scorev2 HR on on Cutting Edge is insignificant. I've experimented this with juan, and his performance on NoMod and HR is similar, variance included. Pictures included:
A 25x300 difference for something that should be a 6ms gap (15ms - 9ms) is absurd, and is most definitely not caused by variance in performance.
I told juan to play on the Stable build as well, and then he noticed a massive difference in accuracy with scorev2 HR. He can usually get a 6:1 to 7:1 ratio on scorev2 HR in CE (and is comparable to his nomod scores consistently), but he can barely break 2.5:1 on Stable. He didn't seem to mention any performance issues either. Here are pics of his scores on the Stable build:
Note that these two charts are also comparable in difficulty as well, both would've been in Group Stages last year (and Sakura Mirage was in last year).
I know you mentioned a HR rainbow fix earlier in the thread, but I'm not sure if it did what it was supposed to do. Seems like it created issues rather than fixed a problem. underjoy's ratio on HR before the fix made sense to me given the relatively low OD and all, my main concern was with the miss count (which I'll talk about in the next section).
tl;dr: Don't increase OD when HR is switched on but keep the stricter timing windows, Accumulating rainbows on OD10+HR is way too strict because the window where a player can control is really really small and it affects an overwhelming majority of charts in MWC. There might be a problem in Cutting Edge where HR currently is as easy as NoMod.
There's also the fact that with the rainbow weightage aspect included, wins on FreeMod are very variance-based rather than performance-based.
My main suggestion is to keep the 40% tighter window (except for early miss judgements) on HR, but not increase OD with HR. 11ms (OD8-OD9 is standard), while I still think is noticeably harsh, is the norm and the 40% seems to fit the other judgement windows quite well anyway. In fact, v1 already does this, this is a OD5 chart on HR. While it appears as if it looks like OD7+HR, is actually aligned with OD5 with 40% tighter windows. Ignore the additional 0.5ms, as it's some byproduct of woc's janky coding.
But that's one thing. The other thing I noticed is that the difference between rainbow difficulty between scorev2 NoMod and scorev2 HR on on Cutting Edge is insignificant. I've experimented this with juan, and his performance on NoMod and HR is similar, variance included. Pictures included:
A 25x300 difference for something that should be a 6ms gap (15ms - 9ms) is absurd, and is most definitely not caused by variance in performance.
I told juan to play on the Stable build as well, and then he noticed a massive difference in accuracy with scorev2 HR. He can usually get a 6:1 to 7:1 ratio on scorev2 HR in CE (and is comparable to his nomod scores consistently), but he can barely break 2.5:1 on Stable. He didn't seem to mention any performance issues either. Here are pics of his scores on the Stable build:
Note that these two charts are also comparable in difficulty as well, both would've been in Group Stages last year (and Sakura Mirage was in last year).
I know you mentioned a HR rainbow fix earlier in the thread, but I'm not sure if it did what it was supposed to do. Seems like it created issues rather than fixed a problem. underjoy's ratio on HR before the fix made sense to me given the relatively low OD and all, my main concern was with the miss count (which I'll talk about in the next section).
tl;dr: Don't increase OD when HR is switched on but keep the stricter timing windows, Accumulating rainbows on OD10+HR is way too strict because the window where a player can control is really really small and it affects an overwhelming majority of charts in MWC. There might be a problem in Cutting Edge where HR currently is as easy as NoMod.
LNs
SPOILER
LNs in v1 seem to be bugged - making them much easier than they should be. While I understand how the LN mechanics work, it doesn't seem to work that way for LNs that you don't let go but you hit the LN head perfectly. For some reason, no matter when you let go of the note, as long as you hit the head perfectly, you will get a 200. If you hit the head a bit earlier or later, you get a 100 instead. Here is video demonstration of this. This is probably (partially) why LNs in v1 are so easy compared to ones in v2 - particularly ones with very tricky releases. Players subconsciously don't let go of LNs properly and they don't get punished for it. In v2, the punishment becomes noticeable. In v2, players get a miss if they don't let go.
Getting rid of this bug is definitely a good start, but since scorev2 is implemented at such short notice (85% of participants probably haven't used scorev2 yet, though you can argue that it's their own fault) and MWC is used as testing grounds, you'd want to make LN releases more lenient than they currently are at the moment for easier transitioning - as players are getting a lot of misses already, even on NoMod. I think a LN leniency of 1.8x would be fine, but this is a bit of an arbitrary figure. I don't really know the effects of this because I don't play LN charts that much, and you're better off asking players like juankristal or _underjoy instead.
tl;dr: Increase LN leniency to 1.8x for easier transitioning, as LNs in v1 contain a bug that makes LN releases much much easier than they should've been in the first place.
Getting rid of this bug is definitely a good start, but since scorev2 is implemented at such short notice (85% of participants probably haven't used scorev2 yet, though you can argue that it's their own fault) and MWC is used as testing grounds, you'd want to make LN releases more lenient than they currently are at the moment for easier transitioning - as players are getting a lot of misses already, even on NoMod. I think a LN leniency of 1.8x would be fine, but this is a bit of an arbitrary figure. I don't really know the effects of this because I don't play LN charts that much, and you're better off asking players like juankristal or _underjoy instead.
tl;dr: Increase LN leniency to 1.8x for easier transitioning, as LNs in v1 contain a bug that makes LN releases much much easier than they should've been in the first place.
I think that should be all, feel free to ask any questions if you're uncertain about a couple of things that I've pointed out/suggested.