forum

[Discussion/Proposal] Improve the Beatmap Nominator Application System

posted
Total Posts
14
Topic Starter
melleganol
Why another change to the BN app system?

The new system that allows people to see the different perspectives of each evaluator was a great achievement. However, it would be prudent to try to improve the system by seeing how it has worked over time. It is quite common to receive contrary comments among evaluators or comments that are not so positive for the applicant when reading their feedback. Evaluators tend to write all over the place and lose the sense of feedback just to go straight to the consensus and overall feeling they have of the applicant.

Problems with the current system

(1) Inconsistent Feedback

  1. user 1: (post) blanket statement that bursts starting on red ticks play better is highly subjective and doesnt make sense as the most intuitive rhythm depends on the structure of the song
  1. user 2: (post) yes damn finally someone bringing up playability of overmaps
Regardless of who is right. The applicant seeing this complete difference in opinions gets nothing at all.

(2) Ambiguous Feedback

I will not make any examples since each point would need a context. However, it is common for applicants to receive subjective comments and to think that it is a big deal. The problem is not receiving subjective comments, the problem is not taking into account how the applicant will read it when receiving the comments. If this problem already happened with regular evaluations, it's incredibly hard not to expect it won't happen with completely inexperienced candidates.

(3) Scattered Feedback

Some evaluators simply have no structure at all and put everything together, making it unnecessarily difficult to read. Plus, some of them only do one part of the feedback and skip another. It is painful to read and learning something from it becomes complicated depending on the case.

A possible solution

(3) Usually the evaluators have a structure where they review map 1-2-3 and that's it. Polishing that structure would lead to something easier to understand, for example:

- - map 1 - -
- difficulty -
judging posts
missed issues

I don't mean that all evaluators must write in a certain way and that one is the correct way, but keep in mind that it is difficult for applicants to read all the scattered feedback. It will be easier for the applicant to see how their posts were and what they missed.

(2) Fixing the second point could be fairly easy if evaluators would symbolize their opinions before they start writing whatever they want. Visualizing the position of the opinion gives a different direction to everything that is written after it. Besides when getting a lot of things mixed up, the applicant can only take it all seriously or not seriously at all depending on whether the other evaluators mentioned the same thing, since if they all mention the same thing it must be obvious and severe (although it doesn't work that way).

More in detail for the points within the structure:

  1. judging posts: It would be a good idea to go with (+) ∙ (+/-) ∙ (-) marks to improve the display and prevent any misunderstanding. So the evaluators can continue to write whatever they want, but their position on the matter is clear from the outset.

    (+) positive (evaluator agrees)
    (+/-) neutral (e.g evaluator agrees with the post, but part of it falters)
    (-) negative (evaluator disagrees)
  1. missed issues: Including marks like (!) ∙ (!!!) in this section can help a lot to visualize the severity of the issues that the applicant did not mention. Something (!) and the applicant thinks it's a big deal or vice versa.

    (!) = missed issue but not severe (e.g evaler thinks smth can be improved)
    (!!!) = severe missed issue (e.g complete lack of contrast)
(1) Taking this into account and returning to the example of the first point. Skipping the playability/representation discussion. User two would give it a (+/) at best since it's a timeline post and it didn't really have a big impact on the map; On the contrary, user one would give it a (-) because the post overrides the mapper's intent. Even if the discrepancy between evaluators cannot be 100% avoided, the applicant can easily deduce through the symbols whether their post aligns positively or negatively.

Conclusion

Visual Example - The readability of the feedback is important so that the applicant can truly use feedback to improve. In this way it would also be more valid to be able to visualize how fair the consensus is to the applicant, and that the applicant is not completely demotivated by thinking that their consensus was pure rng. If evaluators could include that positive/negative for the applicant's posts and severity for unmentioned things it would be much more beneficial to the applicants.
Share
Orange fox feedback
clayton
I imagine just telling evaluators to be more clear about their feedback or giving basic writing tips on-page would go a long way to fixing what was pointed out here. but the more formal suggestions sound fine to me too

melleganol wrote:

  1. judging posts: It would be a good idea to go with (+) ∙ (+/) ∙ (/-) ∙ (-) marks to improve the display and prevent any misunderstanding. So the evaluators can continue to write whatever they want, but their position on the matter is clear from the outset.
I wouldn't understand what these slash things mean if it wasn't explained to me beforehand.
Yumenexa
Agree with clayton
Drum-Hitnormal
agree feedback should be clear and let applicant know how severe , just not sure whats meaning of the /

i think its important to mention the things thats done well, to encourage applicatnt to keep doing those things and avoid being too subjective on whats issue, look at objective issue only
Topic Starter
melleganol

clayton wrote:

I wouldn't understand what these slash things mean if it wasn't explained to me beforehand.

Yumenexa wrote:

Agree with clayton

Drum-Hitnormal wrote:

just not sure whats meaning of the /
added clarification
RandomeLoL
Do have to point out that a couple of the things suggested are a regression of the systems we've been changing over the years. Especially the more recent changes.

I personally agree with points 1 and 2. Sometimes there's incongruent feedback coming from different evaluators. It's important to realize that during group stage, these differences are settled. And while that's clear to us, I agree that unless specified in the general feedback, it's hard for the evaluated user to know which opinion is the right one.

However I don't see much value being added by the solutions proposed. Other than standardizing the way everything is written and making it quite strict, it doesn't necessarily mean that the writing itself will necessarily be more/less confusing. Which is precisely how it worked before.

Personally, this isn't so much a systematic issue (plus feedback writing changes a lot between modes, so this is not something that can be generalized either). If anything, as evaluators we should probably be more mindful of what's written and understand that it has to be understood by someone else - not just us. But I do not think the solution is to constrain the way feedback is written all across the board. What works for one, may not work for another.

I would also encourage replying to an evaluation with a message asking for clarification on anything that may've not been understood or properly explained. Making use of that new tool should help getting answers when the writing itself doesn't provide them.

So yeah tldr I do agree with clayton. In mania at least we used to have templates for feedback, but ever since we've relied on our individual comments to do most of the talking, we've just tried making sure the general feedback cleared up any incongruent messages found in them.
Serizawa Haruki
I do agree this could be improved, but I think the way feedback is presented is far from the only issue with BN apps.

Just to name a few things:
  1. Even after the recent changes, there are still arbitrary expectations not written anywhere. For example, on 2 BN apps I saw evaluators mentioning that the map chosen for nomination is a "safe pick" because it was made by a famous mapper and therefore doesn't showcase the applicant's ability to identify what's suitable for ranked. But what's wrong with picking a popular mapper's map? They also need modding and improvement sometimes and if anything, it shows the candidate would pick high quality maps which is a good thing. Another example was about an app having 3 TV size maps which was considered insufficient for some reason. If such maps are really not wanted on BN apps, why not communicate this beforehand to avoid it?
  2. Some comments are way too subjective and not relevant/helpful at all, for example "this part of the map is really boring" and similar stuff. Only things that are actual issues should be brought up and not preferences.
  3. There really should be the possibility to appeal, which is not the case based on the response here: https://bn.mappersguild.com/message?eval=6613776f12d9dee58c2d1d3e
For a more in-depth analysis, read from this post onwards: community/forums/posts/9486889
Topic Starter
melleganol

RandomeLoL wrote:

However I don't see much value being added by the solutions proposed. Other than standardizing the way everything is written and making it quite strict, it doesn't necessarily mean that the writing itself will necessarily be more/less confusing. Which is precisely how it worked before.
It shouldn't be that hard for the evaluators to add the marks, plus the focus of the post is readability, but I have in mind that I can fix other problems like:

Drum-Hitnormal wrote:

i think its important to mention the things thats done well, to encourage applicatnt to keep doing those things and avoid being too subjective on whats issue, look at objective issue only

Serizawa Haruki wrote:

Some comments are way too subjective and not relevant/helpful at all, for example "this part of the map is really boring" and similar stuff. Only things that are actual issues should be brought up and not preferences.
SEV rating can also solve this thing. Evaluators should mark these comments so that the applicant can visualize the obviousness/severity, but this also makes the evaluators more aware of what they are writing.
RandomeLoL
Again, we are going full circle with that one. We removed SEV for a lot of reasons, and really doubt that giving an arbitrary number to the severity of an issue is going to make it more understandable to the end user. Especially if different evaluators were to assign different numbers.

People didn't like getting graded, and would end up worrying more about the number itself than the point that was trying to be made. I personally do not think this is a good idea for applications in specific.
Topic Starter
melleganol

RandomeLoL wrote:

Again, we are going full circle with that one. We removed SEV for a lot of reasons, and really doubt that giving an arbitrary number to the severity of an issue is going to make it more understandable to the end user. Especially if different evaluators were to assign different numbers.

People didn't like getting graded, and would end up worrying more about the number itself than the point that was trying to be made. I personally do not think this is a good idea for applications in specific.
BNs didn't like getting graded (some bns still think it was good tho), but for inexperienced candidates it can be nice to get feedback in a clearer way than evaluators saying they don't like something and expressing it as if it's a big deal. As you will understand, I did not want to use too many examples, as it could be inflammatory, but these problems are real.
Serizawa Haruki

Serizawa Haruki wrote:

Some comments are way too subjective and not relevant/helpful at all, for example "this part of the map is really boring" and similar stuff. Only things that are actual issues should be brought up and not preferences.

melleganol wrote:

SEV rating can also solve this thing. Evaluators should mark these comments so that the applicant can visualize the obviousness/severity, but this also makes the evaluators more aware of what they are writing.
No, SEV rating has nothing to do with it. My point is that such comments should not be made at all as they have no place in an evaluation. Again, the way feedback is delivered/structured can be perfect but if the feedback itself isn't good, it won't matter. I don't disagree with your main idea but there are more things to consider.
RandomeLoL
I agree with that. It really doesn't matter what kind of numbering system you use or what grades you're given. What will matter the most is the writing itself and what the applicant does with the feedback they're given. If our feedback is unclear, erratic, or inconsistent then no rating at all will help making it easier to understand.
Topic Starter
melleganol

Serizawa Haruki wrote:

No, SEV rating has nothing to do with it. My point is that such comments should not be made at all as they have no place in an evaluation. Again, the way feedback is delivered/structured can be perfect but if the feedback itself isn't good, it won't matter. I don't disagree with your main idea but there are more things to consider.
Yes, I agree. Especially your first point about the arbitrariness about the “safe pick”. One change at a time, I guess.

RandomeLoL wrote:

I agree with that. It really doesn't matter what kind of numbering system you use or what grades you're given. What will matter the most is the writing itself and what the applicant does with the feedback they're given. If our feedback is unclear, erratic, or inconsistent then no rating at all will help making it easier to understand.
I think that would be more difficult to achieve, since they currently don't do it. It could be something other than the SEV, but something that works like the current (std) bn eval “subjective feedback” system. There is no way evaluators can mark so accurately how subjective their comments are, however visualizing it with marks helps to see the evaluator's perspective and how the applicant should react to the comments. Numbers don't have to mean something so specific, as long as they communicate a certain message. Surely there will be many messages that were not read as the evaluators thought they would be read.


EDIT:

Apparently the worst part is the SEV rating usage, so I changed that part and added a different type of mark that may work in a similar but less specific way. Fixed other things as well.
Please sign in to reply.

New reply