1. osu! forums
  2. osu!
  3. Development
  4. Ranking Criteria
  5. Finalized/Denied Amendments
show more
posted

Okoratu wrote:

i think no one ever questioned that romanisation should be based on the language lol

shall i change the current wording to mandarin while we figure the rest out?
Yes please
posted
ok done
can you debate the actual points instead of semantics then?
i dont have any power to nuke any posts here (thankfully for some retarded debates on the earlier pages) but please keep it short an concise

-> can someone sum up briefly where we stand currently on Cantonese? If we want to make people understand how it's pronounced / read then we'd need something different from the mandarin method anyways

anything that is reliable and produces readable text for people that are not cantonese should be good fwiw
posted

Monstrata wrote:

Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
uh not too sure id agree on covers that just do same voice but if it's a drastically different song id say you should be following the artist
posted

Natsu wrote:

It would be really annoying without the label, for example when you map the TV Size and the Full version the mapsets merge together, Also if you're looking for a normal length song you are going to get a bunch of tv sizes or viceversa... and being honest people rarely care about tags, so dropping in to tags isn't going to work.
strongly agree :):)
posted
Completing the cantonese part, nold_1702 and me re-wording the proposal about Chinese and Cantonese stuffs a bit.
We think Chinese / standard Chinese / Written vernacular Chinese are actually towards the same thing, and Mandarin is kind a tone of them that isn't a language. So we just use the Chinese back.

Glossary
Character-by-character Romanisation: each Chinese character must be romanised as a capitalised word and separated with a space.
Rules
Songs with Chinese metadata must be romanised in accordance with the Character-by-character method by using Hanyu Pinyin system in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Songs with Cantonese metadata must be romanised by using Jyutping system.
posted
As a speaker of the two languages, I struggle to understand why you keep emphasising that Mandarin and Cantonese are just different tones. Because they are languages that use Chinese characters not just different tones. But whatever, it is not something that we have to discuss here. (For some of you who may be interested, here is a fun and short youtube video explaining the differences and similarities between the two: https://youtu.be/s2km_z4-1T8 )


Anyway, I modified the grammar and changed some words to more precise ones:

Glossary
Character-by-character Romanisation: each Chinese character must be Romanised as a capitalised word and separated with a space.
Rules
Songs with metadata in Chinese must be Romanised in accordance with the Character-by-character method by using Hanyu Pinyin system in Romanised fields if there is no Romanisation or translation information listed in a credible source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Songs with Cantonese metadata must be Romanised by using Jyutping system.
posted
REmoved the part where it talked about unavailability of preferred romanisation: "If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria." handles that.

Reverted Glossary

Decluttered the rule into the following statements:
  1. Songs with Chinese metadata are to be handled with respect to the tones and dialects of Chinese they belong to. In any case, al diacritical tone marks must be omitted:
    1. Mandarin metadata must be romanised using the character-by-character method.
    2. Cantonese metadata must be romanised using the Jyutping system.
    3. If the song falls into neither category, this choice is left up to the mapper's discretion


i hope this is more clear and captures the spirit of what you wanted to say while being more straightforward to digest

ToDo:
- Spacing of special characters retarded loopholes fixing
- !where Korea
- common markers rules
- CrystilionZ point needs to be applied but idk how

If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria.
nope, this refers to special characters and title formatting rulings above you cant stuff that into one thing
how is a translation not officially referring to the song in multiple ways?
posted
Hmmm you missed the Hanyu pinyin thing in the Mandarin part and the character-by-character part in the Cantonese part.

So both of them use the character by character method.
Mandarin uses Hanyu pinyin system to romanise chinese charaters
Whereas Cantonese uses Jyutping system to romanise chinese characters

So:

Mandarin metadata must be romanised usong the Hanyu Pinyin system.

Cantonese metadata must be romanised using the Jyutping system.

In both cases, the character-by-charatcer method is to be adopted.
posted
fixed!
posted
should one also append (TV Size) to the end of songs which are tv size but don't indicate it in the title? as it stands now, the proposal seems to point to no. what's up with that?
posted
Seeing how discussions have died, I want to post some ideas I was planning on bringing up later since the time limit was close (half a month ago). This also has a few rule changes and guidelines. Some may not even need to be guidelines, but I wanted to spark discussion on them anyways and decide whether or not they are worth adding.

Regarding Full Width Special Characters:
When it comes down to adding spaces for special characters, there is one more issue with it that I think should be addressed. Some languages like Japanese, Chinese, whatever else is in here, and the likes don't utilize spaces when reading or writing. Seeing as how Japanese is one of the most common languages here in osu!, they normally write their special characters in full-width. The Comma (、,), colon (::), brackets ((())), as well as some others, wouldn't need a space. The current rule doesn't really mention these full-width characters.

For example:
チト(CV:水瀬いのり)、ユーリ(CV:久保ユリカ) (Official)
チト (CV: 水瀬いのり)、 ユーリ (CV: 久保ユリカ) (Proposal)
チト (CV:水瀬いのり)、ユーリ (CV:久保ユリカ) (Full-Width without spaces, Follows proposal otherwise (including the parenthesis guideline). The Parenthesis are half-width, so they would naturally have a leading whitespace.)
Chito (CV: Minase Inori), Yuuri (CV: Kubo Yurika) (Romanized Proposal)

http://www.bjd.com.cn/ A Chinese newspaper site. All special characters are written in full-width and it doesn't utilize spacing.

The tl;dr is that certain special characters in full-width don't need to utilize spaces since they are somewhat naturally included in them. This is not the case with all characters and should be used accordingly.

-----------------------------------------------

Regarding half-width & full-width usages of characters in the Unicode & source fields:
(Brought up to me by S o h)
Special characters should retain their original full-width/half-width characters in the Unicode fields. An exception to this is when it used for additional complimentary info like the CV section or mix descriptors. Improper usages can result in errors while searching. https://osu.ppy.sh/ss/10623085
Example using "カラフル。(Extended edit)"
The period cannot be substituted for its counterpart. "カラフル.(Extended edit)" is not acceptable.
The parenthesis may be either half or full-width. "カラフル。(Extended edit)" is acceptable.

Original width usages should still be prioritized in the unicode field when possible.


------------------------------

Regarding Special Characters and Spacing:
(I posted this earlier, but I might as well add it here)
ジョジョ~その血の運命~ Archetype MIX Ver.
JoJo ~Sono Chi no Sadame~ Archetype MIX Ver.

when a symbol is alone and doesn't have a spacing, the romanization should have a whitespace before and after.(Ex. if the title was "ジョジョ~その" we'd use "JoJo ~ Sono" when romanizing)

When a symbol comes in pairs (like mentioned above), use a space before the first symbol and after the last symbol (Not needed if the symbol is the last character). (Ex. if the title was "ジョジョ~その血の運命~" we would use "JoJo ~Sono Chi no Sadame~"

This can be excluded if the song has a good enough reason not to use it.

----------------------------------------

Standardizing the Romanised Artist Field Order:
Another topic I want to bring up is one from a few years ago. Since we're trying to 'standardize' metadata, I feel like pushing this old thread: Romanized Artist Preferences, as it would actually benefit with the current proposals.
Right now we basically have to search high and low to find an obscure reference for a preferred romanization when a much simpler method that most database and wiki sites use is a simple standardization of "Family Given" or "Given Family" and such. In the end, our artist fields end up messy to the point that you can't tell which order is which anymore.

Fycho also brought up a point of artists sometimes have an official Translated or English name, so we'd have to figure out if those would get more priority or not. Ex. 周杰伦 is Jay Chou in English, but Zhou Jie Lun when romanized.

Right now this is my current proposal:

When romanizing the artist field, it must be printed out as the Unicode field would be when read. The sole exception to this is if the artist has an official translation and are widely known with this name. (Please English this better. The idea is simply that we type any order out on how it would be read.)

The second line would be in cases like Girls' Generation where 소녀시대 is romanized as Sonyeo Sidae (I believe). We'd still use Girls' Generation in this case. This also includes the Chinese example mentioned earlier.

Pros:
- Consistent metadata with their Unicode counterparts and we no longer have to check for preferred romanization order anymore.
- It standardizes the romanized artist field for every language, not just Eastern.

Cons:
- It will conflict with some artists' preferred romanization (Kurosaki Maon will be used instead of Maon Kurosaki and such. A lot of famous video game composers are more recognized by Given - Family as well.)

If we're going to standardize things here in osu!, we might as well tackle this since it's also fairly inconsistent at times. Hi Shimotsuki Haruka Shimotsuki.

-----------------------------------------------------

Regarding TV Size:

Even if we were to open this to say, a community vote, (and I might be jumping the gun here) I'm sure the majority would rather include the length markers, so I'll try to keep it simple.

(TV Size) is used for cuts that are used in the show. (Anime/TV Show OP/ED, Insert Songs if shortened, etc)
(Short / Extended Ver.) for everything else. (Game Size is rarely used anyways now I think about it.)
Manually cut songs that closely resemble a (TV Size) on an applicable song would use (TV Size), otherwise, they should use (Short Ver.) or (Extended Ver.)

That's about as simple as I can make it I guess so it's as standardized as possible. The biggest downside to this is that it's difficult to tell Cuts and Official releases apart, but this makes it so we don't have to be direct when it comes to the versions, and it still does mention the length appropriately. The alternative is to use whatever the original release was before the cut, but then it contradicts the point of having a marker to reference the maps length on sight.

The main goal here is to make the labels as more as identifiers and less as official then it makes sense.

--------------------------

Regarding songs that have multiple sources:

When a song has appeared in multiple media, it may use the source that the mapset is themed around (Backgrounds, Storyboards, Videos, etc.) as long as the song itself appeared in it. These should use the direct source instead of the franchise source if applied.
Examples:
https://osu.ppy.sh/s/446547 may use Grand Theft Auto Vice City as the map is themed around it and the song appears in-game.
https://www.youtube.com/watch?v=UrJcQ2nZips may not use Naruto as a source as the song doesn’t appear in any Naruto media, even if the map itself is themed around Naruto. These can be placed in the tags.

------------------------------------

Regarding Original Releases without a source:
This will have to be mostly case by case, but if a song has had a noticeable gap between its original release and then eventually ends up on another media, (take that GTA song mentioned above) the source field isn't required and can be moved to the tags instead.
This may not have to be so much of a time-gap as well. We could try focusing more on if the first source released has any major significance.

----------------------------------------------------

Repeated words in romanization:
When a song uses repeat words in the title (one in unicode, and the other as a basic romanization), the romanized field should omit the repeated word.
Examples:
AIRI-愛離- would normally be AIRI -Airi- as a romanization. This proposal would have the romanized field just be AIRI. The Unicode would still be AIRI-愛離- as it originally is.

A more severe example of this would be:
Normal: (Unicode) 花簪 HANAKANZASHI -> (Romanized) HANAKANZASHI HANAKANZASHI
Proposed: (U)花簪 HANAKANZASHI -> (R)HANAKANZASHI

--------------------------------------------------------

Using LOGOS to determine stylization choices:
Sometimes the romanization of a non-roman language will lead to little to no info of how to romanize the artist's name. In the case of where a logo is only found on a website or a CD cover writing the song in all capitalization, We should be using standard capitalization methods ( https://capitalizemytitle.com/ as we generally would in any standard title or name)
Artist preference in any other case must still be followed over this.

In other words, this will hopefully prevent ITO KASHITARO cases from happening again. This is more case by case guidelines, but the idea of romanizing based on what may possibly be just a font has lead to some unfavorable romanizations in the past.

----------------------------------

Regarding covers and use of original metadata over the covers
Brought up originally by Monstrata. Sometimes a cover by another singer may be listed with slightly incorrect metadata compared to the original. We should probably use common sense when approaching this and judge them case by case. If the cover itself has very minor errors, then the original title would be recommended. If the cover feels like more of a remix or has been altered in some major way. The cover title would be recommended.


Umm yeah. Sorry I've been kinda absent on this proposal. I'm gonna try to be a bit more active so we can get this pushed forward as it was due 2 weeks ago. Hopefully, we can get this finalized by the end of the month (My goal now)

Anyways. Happy reading. Smack me if anything seems unreasonable. I mostly just want to spark a bit more discussion before we push this forward, and I wanted to attempt to merge a few more ideas I was originally planning on bringing up after this proposal went through.
posted
Sorry for disturbing, but I would like to confirm whether character-by-character method is also applied to the Romanization of Chinese artists name (if s/he hasn't provide an official Romanization). If so, I suggest mentioning it in the proposal as it differs from native users' daily practice, and may result in confusion if not specifically mentioned in ranking criteria. Like this: "Songs with metadata in Chinese, including both Artist and Title, must be Romanised in accordance with the Character-by-character method..."
posted

pw384 wrote:

Sorry for disturbing, but I would like to confirm whether character-by-character method is also applied to the Romanization of Chinese artists name (if s/he hasn't provide an official Romanization). If so, I suggest mentioning it in the proposal as it differs from native users' daily practice, and may result in confusion if not specifically mentioned in ranking criteria. Like this: "Songs with metadata in Chinese, including both Artist and Title, must be Romanised in accordance with the Character-by-character method..."
Can you explain how it differs from "native users' daily practice"? Metadata = artist + title. Every language in osu! is (and always has been) using the same system for artists and titles (unless preferred Romanisation for one exists)
posted

Wafu wrote:

pw384 wrote:

Sorry for disturbing, but I would like to confirm whether character-by-character method is also applied to the Romanization of Chinese artists name (if s/he hasn't provide an official Romanization). If so, I suggest mentioning it in the proposal as it differs from native users' daily practice, and may result in confusion if not specifically mentioned in ranking criteria. Like this: "Songs with metadata in Chinese, including both Artist and Title, must be Romanised in accordance with the Character-by-character method..."
Can you explain how it differs from "native users' daily practice"? Metadata = artist + title. Every language in osu! is (and always has been) using the same system for artists and titles (unless preferred Romanisation for one exists)
Officially, native speakers never romanize their name via character-by-character method under any circumstance (e.g. 曹雪芹 is always romanized as Cao Xueqin in daily practice, instead of Cao Xue Qin or Cao Xue qin). So I am not sure whether the character-by-character method is applied to Artist name since it contradicts with our habits. If that is true, a specific clarification is better in wording in my opinion.
posted
@Chinese clarify pls im confused


Can someone sit down with me trying to digest Lanturn's post into rulings
posted
@chinese in a nutshell Chinese names are usually romanised like "family given (or given family idk)" with only one space separating given name and family name. osu uses character-by-character cuz of the difficulty to separate words but there is no difficulty separating given name from family name and vice versa so Chinese names shouldn't be romanised character-by-character and insteand should be romanised normally, with one space separating given name and family name.

Lanturn just brings stuff that he thinks they're worth considering up. I'll try to simplify it here I guess

1. full width special chars already have built-in space so they don't need whitespace before or after those chars. Proposal should add this statement about full width chars.
2. Full width chars are handled differently. example: "。" is full width period "." and they are not interchangeable in the unicode field but half-width brackets "(" and full-width counterparts "(" are. << should be fixed
3. specify stuff regarding spacing when there are special characters involved. (romanisation)
4. screw artist's romanisation preferences. All names should be romanised like how they are read in their original languages. ea Japanese names will be romanised with Family-given order only regardless of artist's preference.
5. (TV size) and other designators.
6. what should we do when a song is featured in a lot of medias (like featured in a lot of games/movies/animes). Lanturn proposed that source should be designated according to what's the map is themed around (sb/bg etc.)
7. if source doesn't have major significance it can be moved to tags instead.
8. ignore repeated words when romanising stuff.
9. Logos aren't reliable when it comes to capitalisation. should use a standard method instead.
10. Covers often get metadata wrong. Compare data with the original release and use commonsense when dealing with covers.
posted
Hi~

Noffy, Lanturn and I sat down and got this worked out as a draft implementing all the above points.

The draft as a whole is available over there: https://gist.github.com/Okorin/c551fd42 ... f51ffb2736

if nothing else is brought up i'll PR this in a week ok

thank
posted
a minor stuff, For the consistency, romanize => romanise
posted

Fycho wrote:

a minor stuff, For the consistency, romanize => romanise
Pretty sure both ways of typing it out are fine (same with romanization = romanisation) though surely it'd be nice to use it consistently if that's what you meant with this ¯\_(ツ)_/¯
posted

Okoratu wrote:

Hi~

Noffy, Lanturn and I sat down and got this worked out as a draft implementing all the above points.

The draft as a whole is available over there: https://gist.github.com/Okorin/c551fd4263e437e0ffcbd3f51ffb2736

if nothing else is brought up i'll PR this in a week ok

thank
Romanisation, Romanise, Romanised, etc. should always be capitalised.

"Lenticular brackets should be romanised to either quotation marks or square brackets depending on the context they are used in."
A bit confusing, what are the two contexts of use?

In "Russian" Romanisation: "ё should be romanised to ye, however, use yo or o to avoid usage of special characters."
Don't even mention it should be Romanised to ye if we're not doing that, it's confusing. Also, using yo or o ? It should only be yo, it is never pronounced o?
show more
Please sign in to reply.