forum

The BN Evaluators Trial Review and Discussion

posted
Total Posts
30
Topic Starter
Noffy
Hello! Please note that the following is for just the osu! game mode.

osu!taiko is currently running a similar trial, which may have its own post in the future, but discussion should be kept around just osu!standard for this thread.

Overview

Previously, NAT have been handling BN applications and evaluations themselves. They would pursue discussion and be the final decision makers. BNs would be able to mock evaluations on applications for NAT to have a better idea of who would be good NAT candidates in the future, but their votes did not have influence on final decisions.

When NAT did evaluations solely on their own, each evaluation would be randomly assigned 3 NAT on the BN website. NAT would do these individually initially to avoid confirmation bias. When an evaluation card reaches 3 NAT evaluations, it would move to the group discussion stage where NAT would discuss their decisions and write feedback. A discord bot would notify them of assignments and upcoming due dates. NAT could choose to add a random amount of BN evaluators, but the BN evaluations were not required for cards to move to group stage. BNs would not be able to see evaluation cards in the group stage as well.

How this changed during the trial

Starting in mid May we started a trial of a slightly different system. In this trial the osu! standard NAT were joined by Beatmap Nominators in evaluating current BNs and BN applications. This gave BNs full access to evaluating both applications and doing current BN evaluations alongside the NAT. They got to be more involved in the decision making process, participate in following discussion hosted in NAT channels, and write feedback sent to the users evaluated, all alongside the NAT members. Recent BN applicants may have noticed this in seeing purple names on their BN application feedback.

Evaluations would be assigned 2 BNs and 2 NAT, and require 4 evaluations to move to the group stage.


Who participated

Trial participants were selected from BNs who volunteered their interest.

The first wave of the trial was a smaller group consisting of Uberzolik, Mafumafu, Andrea, Nana Abe, Riana, Cheri, Sparhten, Firika, fieryrage and Bibbity Bill (10 users). It ran from mid May to mid June.

The second wave of the trial was a bigger pool of users and over a longer span of time, running from mid June to mid August. This group consisted of VINXIS, StarCastler, Cris-, Petal, UberFazz, realy0_, AJT, rosario wknd, -Keitaro, Elayue, Kudosu, NexusQI, pimp, NeKroMan4ik, Mirash, Mimari, Stixy and Morrighan (18 users).


Trial Feedback

After each wave ended, we surveyed the BNs for their thoughts on the trial. At the end of the trial we additionally surveyed the NAT members for their thoughts as well. The purpose of the trial is to explore where we can go for handling main osu! game mode's BN system in the future, which this thread is to help further discuss.

Below is a summary of the survey response takeaways.

BN Survey Summary
Summary
Would you continue doing evaluations if you were given the option?
  1. Responses were overwhelmingly "yes", with only a handful saying no or not sure.
What are your thoughts on the trial and how was your experience with it?
  1. Difficult to communicate, especially in cases where opinions were split and they had to come to a conclusion. In larger groups this decision making lacked a clear process.
  2. Fun and gives new insight into the process, is a cool way to contribute.
  3. Feels better than previous mock BN evaluations, opinions mattering made it more interesting and motivating.
Do you think we should keep running this after the trial? Is there anything you would change?
  1. It is hard to come to final decisions, and would need NAT guidance.
  2. Weigh the results so it is not solely BN decisions deciding who is accepted, rejected, passing current evals, etc.
  3. BNs already have enough to do with main BN work, and evaluations would only be a distraction from their main priority. This would lead to either their BN activity or evaluation participation dropping drastically.
  4. BN self management may not work, but it can be a good replacement for previous mock evals for finding new NAT members.
Do you think any BN should be given the option to do evaluations? If not, where do you draw the line?
  1. This depends on where the system goes
  2. NO, have set restrictions such as a good BN record for x length of time.
  3. Remove those who perform poorly from doing BN evaluations in the future.

NAT Survey Summary
Summary
Most points from the BN summary were in the NAT responses as well, so this summary will focus on points specific to the NAT survey.

What are your thoughts on the trial and how was your experience with it?
  1. Opinions are fairly split between being satisfied with how it did better than expectations, feeling like it's not realistically any different, and worried that it will have more issues to deal with among changing standards.
Do you think we should keep running this after the trial? Is there anything you would change?
  1. Cycle frequently
  2. Good for BNs to showcase other skills related to NAT work.
  3. Fewer users at a time, such as 5-7
Do you think any BN should be given the option to do evaluations? If not, where do you draw the line?
  1. Not all BNs, require them to be BN for ~6 months.
  2. Have a good record without warnings for behavior, quality issues, etc.
  3. No recent issues in their evaluations.
  4. Otherwise, give those interested a shot. Don't actively look for participants.

Document on additional Trial NAT Observations as of wave 1 written by yaspo

Observed problems

We saw several issues throughout the trial cycle, such as:

  1. BNs using their new abilities to leak evaluation and application results before they were ready or finished.
  2. One BN participating in the trial being placed on probation mid way through, and having to remove them from the trial because of this.
  3. How we can structure this to where BNs do not see their own current BN evaluations in real time, as this can lead to panic if someone knows for sure their eval is going poorly, and general awkwardness from that visibility mentioned by several BNs throughout the trial.
  4. How to transition between waves, wave 2 participants had difficulty finishing wave 1 work that was not quite finished, due to not having individually evaluated the cards themselves.
  5. In both waves we saw carry users who did the majority of the work while some users faded out of active participation.
    1. In wave 1 this was partly due to all evals being done so quickly that participants that had less free time simply did not get the chance to do so.
    2. In wave 2, we implemented a system where people unassigned could not do evaluations until they were close to their due date. This was to give assigned users a fair chance to do their given evaluations.
    1. However, the wave 2 change instead lead to many evaluations becoming overdue, due to no active system to track when it would be open to those who aren't assigned to the evaluations.
  6. Fluctuating standards between evaluators. NAT as a smaller group work closer together and have overall similar standards with different insights, but in larger groups among the BN standards for evaluations varied more wildly. Some BNs were much stricter than NAT, while others were far more lax. In a longer term, this has risks of BN application results being more inconsistent and RNG based.
Where we go next

Feedback for the trial was generally positive, though many BNs in wave 2 felt they would do it occasionally or in shorter terms.

Most people surveyed felt there should be restrictions in place for who can participate, such as having a good record and being a full BN for the previous 3-6 months (time varied per response). Others felt participating in this format was fun, but not fully a good idea to be on equal standing with the NAT due to it being harder to track bias and differing standards.

Originally, we wanted to explore BNs for the osu! gamemode fully evaluating themselves with a few key NAT members helping to provide guidance and serve as tiebreakers. However, when looking at survey results we may push in a different direction and would like to discuss this further before making any final changes or decisions.


Discussion

Currently we know this for sure:
  1. If implemented permanently, it would be on a cycle-based system to help combat motivation dying out and negatively affecting turnaround time. Wave 2 was definitely too long, so somewhere around a month would be ideal.
  2. We will also implement some form of limits on who can participate, but the exact details for that have not yet been decided.
This leaves figuring out the details for the new implementation of what we trialled, and figuring out solutions to the problems faced during the trial if they are determined to be solvable.

How do you think the future system should look based off the trial information and survey results?
Naxess
to where BNs do not see their own current BN evaluations
think picking participants that don't have evals upcoming in that period would fix this, that's how the recent taiko bn evalers cycle was done anyway
Basensorex
>In a longer term, this has risks of BN application results being more inconsistent and RNG based.

null point considering it wasnt much different under the usual system, especially a few months ago
ikin5050
If you choose to implement this in cycles of a month length the problem of people having to pick up half finished work from the previous cycle needs to be addressed.

Could be done instead with a month cycle of evaluating and then agreeing to standby for a few extra weeks in case discussions need to be had about users who were evaluated?
clayton
cool stuff, it looks promising that most of the observed problems are things that can be worked out system/dev-side and u kinda already know what to fix. also promising that each wave collected helpful feedback and you're iterating on these ideas quickly. I don't have much to add to what was already said so I'll be keeping my eyes out for a report after wave 3 :^)

Naxess wrote:

to where BNs do not see their own current BN evaluations
think picking participants that don't have evals upcoming in that period would fix this, that's how the recent taiko bn evalers cycle was done anyway
is it difficult to hide them on the website? or does "see" mean like get word of discussion somewhere?
-White
> In a longer term, this has risks of BN application results being more inconsistent and RNG based.

I'm very concerned about this one, especially since NAT standards alone were never consistent. I'd like to see some system implemented to increase consistency if possible.
Topic Starter
Noffy

clayton wrote:

cool stuff, it looks promising that most of the observed problems are things that can be worked out system/dev-side and u kinda already know what to fix. also promising that each wave collected helpful feedback and you're iterating on these ideas quickly. I don't have much to add to what was already said so I'll be keeping my eyes out for a report after wave 3 :^)

Naxess wrote:

to where BNs do not see their own current BN evaluations
think picking participants that don't have evals upcoming in that period would fix this, that's how the recent taiko bn evalers cycle was done anyway
is it difficult to hide them on the website? or does "see" mean like get word of discussion somewhere?
in group stage they're discussed in a central discord channel which can't really be hidden unless the people being evaluated just aren't in the wave. moving it all to the website decreases visibility for participants (new evals and stuff moving to group is also notified on the discord for visibility) and is a lot of dev work for recreating something that already exists.

-White wrote:

> In a longer term, this has risks of BN application results being more inconsistent and RNG based.

I'm very concerned about this one, especially since NAT standards alone were never consistent. I'd like to see some system implemented to increase consistency if possible.
yeah this needs a documented guidelines to be set up to go off of for stuff which is currently learned by doing, see "telephone" on yaspo's document too.

ikin5050 wrote:

If you choose to implement this in cycles of a month length the problem of people having to pick up half finished work from the previous cycle needs to be addressed.

Could be done instead with a month cycle of evaluating and then agreeing to standby for a few extra weeks in case discussions need to be had about users who were evaluated?
That's a good idea, will keep note of that. Either that or working on how scheduling is set up so that nothing is due in the last week to ensure all due/overdue cards are finished before a wave ends.
-White

Noffy wrote:

working on how scheduling is set up so that nothing is due in the last week to ensure all due/overdue cards are finished before a wave ends.
I think this is nice. I don't think the cycles have to be black/white, there can be a smooth transition between them, where at times there might be 2 different eval teams active simultaneously, but only one would be taking the new requests. Could just put one or two NAT in charge of each team so that they're independently managed or something.
VINXIS
My main issue was and still is mostly the part where theres no direct line of communication with the applicant/evaluated which also indirectly result in longer waiting times for reappliers to months + not able to communicate/work with people we would be bringing into the group that we would inevitably work with nominating sets at some point

I don't think this improves/worsens the process for applicants/the evaluated either way aside for faster application results sometimes (which also probably wont be consistent either), but otherwise I think expanding it to BNs from just NATs seemed like a decent idea since the start and seemed fine in its functionality in trial from the evaluators' side so
-White
^ Kinda curious what sort of reason evaluators would have to communicate with the applicant, and how that would speed up results?
VINXIS
When applying and failing the application, u are given a wall of text under a Feedback section on the website, where evaluators attempt to summarize what you should improve on based off of the notes written by the evaluators, and then the applicant is told to fix these and reapply later after.

In comparison to having more direct contact with the applicant, where for example u have the discussion/conversation (which Currently happens by evaluators themselves writing notes and after all 4+ evaluators finished writing notes and choosing if they pass/neutral/fail the mans) with the applicant at the time as well, and having some form of like "followups" for anyone that "doesnt pass at the time" in my head at least seems far more easier to convey more valuable information to applicants (and to ourselves)/get issues resolved faster/streamline the process.

Mainly I just think the discussions that are happening in the current system are missing the applicant currently, and are more valuable than the feedback wall that is sent in comparison, and if it is more valuable, would be easier/faster to fix what those issues are
-White
Yes I actually fully agree with vinxis, can we actually do that? I think all of that would improve the applicant experience so much holy shit
Naxess

VINXIS wrote:

I just think the discussions that are happening in the current system are missing the applicant currently
would be careful about including the applicant too early on; current feedback wall is basically eval notes cleaned up from unnecessary/poorly-worded/poor advice, and having the applicant around when discussing what are and aren't issues and how to best convey that would probably get really confusing
UberFazz

Noffy wrote:

How do you think the future system should look based off the trial information and survey results?
I like the idea of allowing certain BNs to evaluate applicants (even though I disagree with doing it in waves, but more on that later), but this raises the question: What becomes of the NAT if the majority of their current work is handed over to BNs?

Disclaimer: This assumes BNs will have the same exact evaluating power as the NAT, similar to the trial.

----

If BNs are to take over evaluations, it'll leave the NAT with only managerial tasks, such as making announcements or implementing changes (like this post). Moderation is included as well, yet I'm unsure of how much moderation NAT members actually do.

As far as I'm aware, these kinds of activities are only limited to a select few NAT members, and it'll make the rest of the NAT nothing more than BNs with fancy titles.

So, what should we do?

Here are some of my ideas.

  1. Leave the NAT as is and allow BNs to eval.
Probably the least appealing option (to me) would be to just roll with it and leave the current NAT as is. This could work, but it really puts into question the reason for keeping someone in the NAT. If a member's only contributions are evaluations, it seems inappropriate to place them in a different boat than the evaluating BNs. Granted, I can't know if any NAT members are in this position (as they could be contributing behind the scenes), but it's definitely possible.

*Edit: Additionally, I believe it's very unlikely for this to work long-term similar to why QAH did not work out. No incentive or recognition will cause very few to be interested, and those that are interested will burn out very quickly if all they're doing is essentially working for the NAT with none of the benefits of the NAT.

  1. Re-organize and/or re-purpose the NAT and allow BNs to eval.
Similar to the first option except it involves separating NAT members that exclusively or almost exclusively do evals from members that contribute in other means that requires them to be in the NAT. This would likely mean moving certain members from the NAT to the BN but still allowing them to actively participate in evals.

This would also involve reworking the way new NAT members would be added. Currently, a big portion of being an NAT member involves being exceptional at evaluating applicants, with skill being shown off through mock evals. If the main responsibility of the NAT is no longer evals, how would NAT members be chosen? Would there be a big enough difference between BNs who are outspoken, GMT members whose main responsibility is related to mapping/modding, and the NAT?

The NAT responsibilities listed in this wiki article that aren't evaluations do not need to be done by the NAT specifically. The GMT can handle moderation, especially with the new waves of mapping/modding GMT, and like I've stated earlier, I don't think many NAT are very interested in this aspect anyway. Structural changes can be done by anyone by making relevant posts on the forums or GitHub. A quick look at the listed responsibilities shows that nearly half (5/11) of the (standard) NAT do not seem to be interested in anything besides evaluations (but they can still contribute in other means if they so choose).

This leads me to my final two options. To be perfectly honest, I have no idea how realistic they are, but this is just brainstorming anyway.

  1. Continue letting BNs evaluate applicants in a similar system to the trial, but use it as a method of weeding out good NAT candidates instead of replacing NATs in general.
To me, this trial seemed like a much better method of seeing how well certain BNs would perform as NAT instead of outright replacing the NAT. This would make for rare waves, and they would only really be used when the NAT is in need of new members, similar to the current systems. Seems pretty simple, but there are certainly some issues with this.

Would this really solve the problem of not having enough manpower to effectively evaluate applicants when the NAT is hesitant to accept new members? Would it be sufficient enough to keep up with the constantly growing community when it's not self-sufficient like the other proposals? Unfortunately, there's no way to know.

The main issue that I've always seen mentioned is regarding moderation. NATs have mod powers, and you generally need to be especially trusted/known to be in a position that has access to site-wide moderation. If we want the NAT to keep their moderation abilities in tact, there's no real solution to this other than being forced to select only the NAT that can also be trusted with this kind of power. However, as I've said earlier, I don't believe moderation is important to the NAT.

This leads me to my final proposal.

  1. Use the new system as a way of weeding out new NAT members, but strip the NAT of their access to site-wide moderation.
This seems like an ideal solution to me, but I'm unsure of how realistic it is. As far as I'm aware, many behind-the-scenes aspects of the community treat the NAT as moderators, so this would force some restructuring in those areas.

However, I'm not proposing for the current NAT to have their permissions revoked. Instead, my idea involves moving the current NAT to the "mapping/modding" category of the GMT, while also keeping them in the NAT. This would effectively make the current NAT the same as before, but would allow for much more liberty in future NAT selection.

This, in theory, resolves the previously mentioned issues regarding the addition of new NAT members. This is also partly why I'm against eval waves, as mentioned earlier. (I personally dislike the idea of only having temporary access to evals, since it would ruin the possibility of having a consistent workflow. I disagree that the absence of waves would cause for burnout or similar, since there should be enough people to manage the given work at any point in time.)

This would drive a clear separator between normal BNs who just nominate/disqualify maps, BNs that also evaluate applicants and current BNs (NAT), BNs that would also like to moderate and/or partake in managerial tasks (BN + GMT), and BNs that evaluate applicants and would like to moderate and/or partake in managerial tasks (NAT + GMT).

Roles would be much more defined, unlike how they would be without this implementation. You'd have no way to know if a BN was responsible for evaluations or not, for example.

*Edit: Thought about it some more and thought of a possible issue with this solution: How would these new NAT members be evaluated? How could they be properly tracked and made sure they weren't messing up?

Evaluations should be done for NAT members as well. How exactly I'm not totally sure of, but it would be preferable if outside voices could be heard to prevent an echo chamber. This would likely include BNs giving inputs on NAT performance, or we could return to a system with NAT leaders in an attempt to address this concern.

Also yeah, the 3-6mo in BN idea is nice too.

----

So those are my thoughts on the matter. I hope at least something can be taken away from this, and if not, it could serve as a nice little thought experiment. Thank you to all the NAT members that are constantly trying to improve the modding scene :)
-White

Naxess wrote:

would be careful about including the applicant too early on; current feedback wall is basically eval notes cleaned up from unnecessary/poorly-worded/poor advice, and having the applicant around when discussing what are and aren't issues and how to best convey that would probably get really confusing
I agree that the applicant shouldn't be included too early, but on the other hand, not including them at all (current system) can (as it did for me) result in the feedback summary lacking any actual actionable steps to improve, and being so "safe" that the feedback only serves to confuse the applicant even more due to how vague and non specific it is. Things like "wording could be improved" doesn't actually improve anyone's wording, but having a discussion with the bns about how exactly they'd prefer to have seen it worded does.
Naxess
yeah we sorta assumed people would contact their evaluators if they had questions or anything was unclear, but people generally seem to avoid that, so swapping the roles and having evaluators contact the applicant to discuss could definitely help with that

we had a similar issue in mentorship where mentees ignored by their mentor would live with it rather than contacting the organizers about it, so we flipped that system and had organizers regularly checkup on mentees instead
-White
Well it's also a pain when you can't contact one individual for feedback. For my app I had to contact each evaluators (4) just to get their specific feedback. After the 2nd I was like "this isn't worth my time" and stopped. It's just a huge inconvenience for everyone to have it so separated I feel like, and a huge burden on the applicant to get the information that any one NAT has complete access to...
momoyo
I couldn't agree more with UberFazz idea and I was actually thinking something like that but slightly different.

My main idea was the same with the only difference of not mixing 2 roles (GMT+NAT) but making a new role one or bringing QAT alive again, since NAT and GMT are 2 different roles when it comes to doing stuff in this game from what I know. The point is to move current NATs to QATs and revoking NATs permissions while QATs keep it the same, so new NATs should be able to focus on doing evals for Beatmap Nominators while the others can focus on other stuff (Such as Mappers' Guild site, other affairs, etc etc).
I know making a new role would require Dev's time but this is still something I believe should be considered in my opinion.
Akito
Personally I wasn't involved with the project but I've talked to BNs and heard complaints about evals so im just gonna post some of of the concerns I observed.

1. Varying standards among evaluators, including some people being much more lenient than the NAT. This can obviously lead to conflicting opinions and lower overall quality in evaluations. I've heard people on multiple occasions complain that "x BN will pass anyone/their standards are so low/do they even look at the mods".

2. The NAT are dead (I don't mean to undermine the work of the NAT in any way but this was also a common complaint). The post says 2 NAT and 2 BN members are assigned to each eval, but this doesn't mean they are the final evaluators. If the evaluation goes overdue anyone can take over, so it was very common for evaluations to be completed solely by BNG members with zero input from the NAT. Many people including some of the trial evaluators themselves have expressed that getting into the BNG right now is extremely easy because of this combined with the first issue.

This could also potentially lead to circlejerk if BN evaluators become a thing since it only takes a few of your friends to say yes on your application.

3. A small group of people carrying all of the work (this has already been addressed in the original post but yeah). After asking some people from the last trial group it seems like very few (maybe 2-4) were able to communicate well, maintain high standards, and have high activity simultaneously. I also heard countless people say things like "I don't even know why I signed up" which shows not many people are actually interested in being evaluators. This probably means the number of suitable candidates for evaluators in the entire current BNG is a only a small handful, and their commitment to doing evaluations is also uncertain.

Not really sure what the best move would be but making a group within the BNG for evaluators seems like a pretty unattractive option given these things.
AJT
pretty much agree with everything Akito said, leading to my view that I don't think this would be sustainable - I just think it's more structurally sound and less prone to exploitation to have it such that the people evaluating you are at a level "above" you, unless the people chosen to evaluate are heavily scrutinised and ensured to be very high quality BNs in all aspects and have fair judgement/reasoning. Otherwise it just feels kinda off getting evaluated by people who also have to be evaluated, especially when standards vary so much.
Also, I don't know how active the NAT was during the first wave but I feel like having these cycles just encourages them to make themselves scarce which leads to 4xBN evals and I don't really think that's ideal but it also can't be avoided if the NAT aren't doing their assignments in time. If there were either no BN evaluators or very few then the chances of having consistent NAT input would be a lot higher

also like Uber I don't think a cycle-based system would work very well - a lot of people already lost interest during their first waves, and I feel like there's only so long before even those who are still interested also become disinterested - and for the people who *do* remain interested, they probably wouldn't enjoy having to stop every other month or so. I feel like a lot of people simply signed up because you just had to click a reaction and they thought "hmm may as well try it out" but I don't really think most people would be interested in doing it long term. (I can't speak for everyone though)
show more
Please sign in to reply.

New reply