forum

[Rule clarification] Standardization of metadata

posted
Total Posts
11
Topic Starter
Serizawa Haruki
Ok so there are still a few confusing things I found about the metadata rules and guidelines. I think these should be clarified to avoid confusion and pointless discussions.

First of all, it should be stated somewhere that the standardization rules such as using feat. instead of ft. apply to both the unicode and the romanized field. This might seem obvious for some people, but nevertheless it's better to explain it clearly for everyone.


Special unicode characters must be filtered to their nearest standard equivalent or removed from the Romanised fields within a .osu file. ★ ☆ ⚝ ✩ ✪ ✫ ✬ ✭ 🟉 🟊 ✮ ✯ ✰ and the likes are substituted to an asterisk. Other special characters are to be romanised or dropped on case-by-case basis.
It wouldn't hurt to add other common symbols that can be romanized instead of only listing a dozen of stars, for example when an interpunct ・ is used to divide 2 words or groups (not first name and surname since in that case it would be omitted).



If a mapset track is composed of two or more songs, list the song titles clearly with a dividing symbol inbetween or use a title descriptive of its contents. If the title becomes too long as a result, a descriptive title must be used instead.
I think it would make more sense to replace "dividing symbol" by "slash" because that's what most maps use. It should also be stated that a slash needs a leading and trailing whitespace in this case for better readability whereas usually it wouldn't be necessary to add spaces.



If a symbol is used to group parts of a title, a whitespace must be used before and after the group, but not directly before or after the symbols within the groups.
This rule somehow contradicts with the definition of a whitespace in the glossary (A visual spacing between characters, not always a literal space. Full-width characters do not require whitespaces.) because the definition says full-width characters don't need a whitespace while the rule says it's necessary. This depends on the context and seems quite misleading, especially for full-width symbols such as brackets, slashes etc.
It also contradicts with "Commas, vs., &, feat., CV:, etc. must include a trailing whitespace. If the marker is preceded by a word, a leading whitespace is also required, unless the marker is a comma." because what if there is a parenthesis followed by a comma? For example: Character1 (CV: Person1), Character2 (CV: Person2) Is a space required between them or not and how does this work in the case of a unicode comma 、?



Single symbols should be romanised so that they have leading and trailing whitespaces, unless the symbol itself is not commonly requiring whitespaces in English. This may be ignored if the artist purposefully uses special characters that ignore their common usages.
It would be nice to clarify which symbols don't require whitespaces in English. Stuff like periods and commas are clear, but what about *, ~, +, -, / etc. ?



Also, can we finally standardize the (Short Ver.) marker as well lol
dorsalplum
I approve of this!
Okoratu
idk how to answer this coherently so i'll just answer by counting quotes:
general point: i agree, we should do that.

first quote:
I agree with the sentiment, but you need more examples that you want listed so that we can work out a wording, the star thing is pretty redundant and can be cut down to 1 star as an example and list more, if you have them

2nd quote: i disagree, that limits what you can use - it doesnt matter what you use as long as you use the same symbol throughout which is what this was about

3rd quote: i dont think that's quite right of a conclusion to draw there but i can see where you're coming from - you point out the problems and how it's confusing but not how you think it should be addressed

4th quote: @Noffy, @Lanturn pls help figure this out, that seems like a meaningful addition
Topic Starter
Serizawa Haruki

Okoratu wrote:

idk how to answer this coherently so i'll just answer by counting quotes:

general point: i agree, we should do that.



first quote:

I agree with the sentiment, but you need more examples that you want listed so that we can work out a wording, the star thing is pretty redundant and can be cut down to 1 star as an example and list more, if you have them

† is one I can think of right now but I'm sure there are more. I used the interpunct as an example because it's an ambiguous case and those are the ones who need to be clarified. It's quite obvious that ~ is romanized to ~ for instance, so it's probably not necessary to add it but I'm indifferent to it.

2nd quote: i disagree, that limits what you can use - it doesnt matter what you use as long as you use the same symbol throughout which is what this was about

What do you mean by "as long as you use the same symbol throughout"? It's only used once in a map so in order to make sure it's used consistently across maps, changing the rule as I suggested would be necessary. Otherwise some maps use /, some & etc. I don't see why mappers need the freedom to choose whatever they want in this case, it should be standardized just like other markers as per the RC.

3rd quote: i dont think that's quite right of a conclusion to draw there but i can see where you're coming from - you point out the problems and how it's confusing but not how you think it should be addressed

I didn't point out concrete solutions because I don't know them either. I have the impression that there are unwritten rules regarding this which some people follow but not everyone knows about. For example, I've seen several recently ranked maps that use full-width characters for the CV format but the RC says to use Character (CV: Voice Actor) only, so technically full-width symbols would have to be replaced by normal symbols according to this rule, similar to how (TV Size) replaces any other TV Size markers. I asked a few BNs about it but got different answers as to what is correct. If full-width symbols are intended to be acceptable in this case, it should be clarified in the RC, otherwise the standardization of stuff like(CV:こいぬ)needs to be enforced from now on.
The other problems I pointed out in the OP are also things that are currently handled in a certain way, but not written down anywhere and many people are unsure about it. It's difficult to think of a solution right now because I think it should be respected what is usually done in such cases. However, these are mostly edge cases that are affected by it so help from people who are involved in metadata affairs would be helpful to sort this out.


4th quote: @Noffy, @Lanturn pls help figure this out, that seems like a meaningful addition
What about (Short Ver.)? Fwiw it should be treated in the same way as (TV Size).
Okoratu
What do you mean by "as long as you use the same symbol throughout"?

you can compile more than 2 song into 1 set how you divide what part of the title is what is up to you

---
short Ver was less common so we spared us the headache of figuring it out looooooool
Topic Starter
Serizawa Haruki

Okoratu wrote:

What do you mean by "as long as you use the same symbol throughout"?



you can compile more than 2 song into 1 set how you divide what part of the title is what is up to you



---

short Ver was less common so we spared us the headache of figuring it out looooooool
Well, there's nothing to figure out honestly. It's just about standardizing the marker itself regarding the syntax, not about adding it to maps that don't have it in the first place (although that could be done as well)
Topic Starter
Serizawa Haruki
Okay ignore the Short Ver. part since that is being implemented in the new RC draft

The rest should still be addressed though
Bibbity Bill
yea i agree, there should definitely be something written about having standardisation rules apply to both the unicode and romanisation fields because currently having stuff like(CV:種田梨沙)、and (CV: 種田梨沙), both being valid just leads to confusion on what is more acceptable/correct.


about your first point, yea more symbols would be nice but i doubt you'll be able to cover all the ones that artists tend to use so to write out big lists for how to standardize specific symbols seems like it would just cause a lot of clutter and be time consuming so I think the current rule does a good enough job at it. but i do think it should be reduced to one star though since having 10 of them seems pointless when it says "and the likes" at the end of it

second one i don't agree with, any symbol should be allowed to be used as long as it's obvious it's 2 tracks.

third and fourth point i think could use better wording and have written examples for both of them, but don't got any idea currently on how you can improve it.
Topic Starter
Serizawa Haruki

Bibbity Bill wrote:

yea i agree, there should definitely be something written about having standardisation rules apply to both the unicode and romanisation fields because currently having stuff like(CV:種田梨沙)、and (CV: 種田梨沙), both being valid just leads to confusion on what is more acceptable/correct.


about your first point, yea more symbols would be nice but i doubt you'll be able to cover all the ones that artists tend to use so to write out big lists for how to standardize specific symbols seems like it would just cause a lot of clutter and be time consuming so I think the current rule does a good enough job at it. but i do think it should be reduced to one star though since having 10 of them seems pointless when it says "and the likes" at the end of it

I agree about removing all those unnecessary stars and just leaving one as an example. After rethinking about it, romanizing most symbols is very straight forward like → becomes -> etc. so in the end, it's not really necessary for those. However, we could add exceptions for symbols which should be omitted such as ❤ since hearts are not uncommon. Plus, the ambiguous case about the interpunct still stands.

second one i don't agree with, any symbol should be allowed to be used as long as it's obvious it's 2 tracks.

Although I still don't see a reason as to why multiple symbols should be used for the same purpose, I'm fine with allowing different symbols as long as the thing about leading and trailing whitespaces is added (even if slashes usually don't require spaces in English which is a bit odd but w/e)

third and fourth point i think could use better wording and have written examples for both of them, but don't got any idea currently on how you can improve it.

Me neither, which is why I'd like someone to clarify these points xD
pishifat
removed the excess stars, added something about standardisation applying to romanized/unicode fields, clarified how whitespaces are contradictory to that standardisation stuff, and short ver is standardised already

oko explained why the songname separator is handled how it is and for whitespaces after symbols, it's best to assume anything that isn't immediately obvious requires a space afterwards

https://github.com/ppy/osu-wiki/pull/2300
pishifat
merged
Please sign in to reply.

New reply