forum

[Proposal] Metadata section overhaul

posted
Total Posts
216
Topic Starter
Okoayu
Hi~

this is the proposal as merged from both t/595864 t/632681 with the help of Noffy and tries to incorporate both proposals while respecting the discussions going on in either thread while unifying both to a standard that is simple to understand and follow-able (ideally like the rest of the Ranking Criteria)


We also tried to include t/687064 , but that thread has stopped moving without any conclusions whatsoever (quite to the contrary, the conclusion seems, like the proposed way just doesnt work at all)

Debate on this will be open for two weeks ending on 05.04.2018 (dd.mm.YY)

Thanks for reading, have fun discussing!
Shiguma
I believe that TV size cuts of songs should have the (TV Size) label on them, regardless of the official source. My reasoning for this is, when you search up a song, having the (TV Size) in the metadata won't affect searching for that song, while also making it very clear that whichever set you are looking at is the short version of a song. If we're bringing common sense into metadata, I don't see why we shouldn't do this.
Pachiru
"Guest mappers and storyboarders must be added to the tags of a beatmap set." People who makes the hitsounds should be added to the tags aswell or it's not mendatory?
Vacuous
If a track has more than 5 artists they must be substituted with Various Artists, similarly if a track is composed of 3 or more individual tracks, the title must be substituded to <Descriptor> Compilation.
What about songs with multiple parts? Would compiling these together warrant it being called "[song title] compilation"? Does that mean that the metadata for this is wrong?
Bunnrei
the last two glossary entries seem less like glossary entries and could be added in the rules/guidelines

The artists of a song must be tracable to existing people.
so something like the "Gorillaz" (a band with fictional members ran by two non-fictional people) would be unacceptable, or does that only apply to individual people?

Any form of vs. such as Vs., VS and the likes are to be written as vs. only.
does this apply even if the original metadata source uses the latter three

...and the likes are substituded to and asterisk
typo boi

everything else seems ok other than pachiru's point (could just be "People who made contributions within the map outside of modding must be added to tags" or smthng)
Topic Starter
Okoayu
@plus:
1 if you can trace the members of a band to real people then that is sufficient
2 always
3 oops

@Vacuous
as far as i understand what you said, this would be mislabeling the song, so yes

@Pachiru
open for debate on that one

@Shigu
fixed by putting asterisk in glossary and just making it default to star and rewording that :thonking:
Kurai
- The word 'Romanisation' should always be capitalised.

- Saying 'Songs with Russian metadata' is dumb as it exludes all other languages that use the Cyrillic script (there are some Ukrainian songs ranked for example).

- Cyrillic Romanisation should follow the BGN/PCGN system (except for the letter ё in Russian which should follow the GOST 2002(B) system). Read more here: http://up.kuraip.net/032209ex3724.pdf

- 'If the song title includes any denoting tag that it is a TV sized cut of a song, use a standard (TV Size) tag in its place in the end of the current title string.'
→ It was decided that TV Size tags should be removed from the titles and put in the tags.

I'll probably go more in depth later!
Topic Starter
Okoayu
→ It was decided that TV Size tags should be removed from the titles and put in the tags.

by? i mean im fine with either but just saying you decided this doesnt invalidate my suggestion of doing it? Especially when you can argue that the more obvious marker you put in there the easier it is for people to discern what version of a song they're getting without having to bother with reading the tags.

the rest is fair enough and was ported from kwan proposal for the most part so i'll adjust it tomorrow, this took way longer and i wanna do something different for today
Sieg
and here we go again with Cyrillic Romanisation
p/6037577
Kurai
@Oko: That was the consensus we reached in the Metadata server after a long discussion on the matter. Please check #guidelines_discussion and search for 'TV Size', you should be able to find the conversation. Invalidating something that has already been put into practice is a bit weird.
J1NX1337
Brackets within artist or title fields should be separated from the other text surrounding it, unless there is obvious reason to not do so.
-> unless there is an obvious reason not to do so.

Slight grammar correction.
Topic Starter
Okoayu
this is a public discussion.
discuss in public i wont go through all channels i can find to recollect what may and may not have happened / occured / agreed on in order to make sense of your argument.

present the argument, debate again, the point is up.

will apply whatever the russian metadata thing ended up in and saying that i find Shiguma's point reasonable, actually
Pachiru
For tags, from draft: "This is to give credit where credit is due and helping others identify the main contributers of any given beatmap set."

Since we should give feedback to the main contributors of the set, we need to put the person who made the hitsounding, which is an important part of the map, because without hitsounds the map is unrankable. If we give credit to someone who made something optional like Storyboard, we should add the hitsounds person to the tag aswell.

(also, don't forget to fix the typo on: "contributers" → "contributors")
Sieg
alright
Cyrillic Romanisation: Use BGN/PCGN system for Russian/Cyrillic. Е and е should be romanised as ye if it stands alone or after a, e, ё, и, о, у, ы, э, ю, я, й, ъ, ь. In other cases, it should be romanised as e. ё should be romanised to ye, however, use yo or o to avoid usage of special characters. Ignore any other rules in the file provided, these are either irrelevant or wouldn't help in the game. If an artist uses a preferred romanisation, follow it regardless of this rule. For most of the other characters, refer to the first page of this document.
I suggest to remove "Cyrillic" and leave it to case by case since current wording is ambiguous trying to cover all Cyrillic languages with different phonetics and different scenarios of romanisation, which is obviously wrong approach. So, reword it to

Russian Romanisation: Use BGN/PCGN system for Russian. Е and е should be romanised as ye if it stands alone or after a, e, ё, и, о, у, ы, э, ю, я, й, ъ, ь. In other cases, it should be romanised as e. ё should be romanised to ye, however, use yo or o to avoid usage of special characters. Ignore any other rules in the file provided, these are either irrelevant or wouldn't help in the game. If an artist uses a preferred romanisation, follow it regardless of this rule. For most of the other characters, refer to the first page of this document.
Delis

Shiguma wrote:

I believe that TV size cuts of songs should have the (TV Size) label on them, regardless of the official source. My reasoning for this is, when you search up a song, having the (TV Size) in the metadata won't affect searching for that song, while also making it very clear that whichever set you are looking at is the short version of a song. If we're bringing common sense into metadata, I don't see why we shouldn't do this.
this. since ive never had a single ranked map of tv size myself i dont know how it would feel like but both tv size and full ver being mixed into one title is quite frustrating as a player. i dont remember when/why exactly it happened, although we'd better bring it back as a lot of the people back then didn't agree on the current rule about tv size (which we have to rely on whether officials have a track with tv size in its track name or not) being finalized.
Nevo

Pachiru wrote:

"Guest mappers and storyboarders must be added to the tags of a beatmap set." People who makes the hitsounds shouldn't be added to the tags aswell or it's not mendatory?
I definitely feel people who make hitsounds for a set should be required in tags, since they do play a large role in mapsets, also because a lot of hitsounders put forth plenty of time and effort into their work.
Nao Tomori
agre w/ nevo abt hitsounders in tags.

i think forcing tv size is a bit stupid since "cut ver." for example isn't forced if it's not official yet it serves the exact same purpose. it takes players about 3 seconds to see if a map is tv size or not, and in many cases dropping it makes the title look much cleaner anyway.
Pachiru

Nevo wrote:

Pachiru wrote:

"Guest mappers and storyboarders must be added to the tags of a beatmap set." People who makes the hitsounds shouldn't be added to the tags aswell or it's not mendatory?
I definitely feel people who make hitsounds for a set should be required in tags, since they do play a large role in mapsets, also because a lot of hitsounders put forth plenty of time and effort into their work.
Yes, I've mispelled one word, I gave my opinion on this down, and it meet yours :)
jeanbernard8865

Nevo wrote:

Pachiru wrote:

"Guest mappers and storyboarders must be added to the tags of a beatmap set." People who makes the hitsounds shouldn't be added to the tags aswell or it's not mendatory?
I definitely feel people who make hitsounds for a set should be required in tags, since they do play a large role in mapsets, also because a lot of hitsounders put forth plenty of time and effort into their work.
i think the rule should be extended to anyone whose contribution is visible in the mapset ( for example, someone who provided the mp3 for a map or a modder will not be in tags, but someone who keysounded a section in the top diff will )
ailv
If a track has more than 5 artists they must be substituted with Various Artists, similarly if a track is composed of 3 or more individual tracks, the title must be substituded to <Descriptor> Compilation.

What about albums? E.g If I were to map https://soundcloud.com/owslaofficial/se ... nctuary-ep the entirety of this ep mixed into one track, I can't see why I'd have to name it "Sanctuary EP Compilation" over just "Sanctuary EP".


Songs with German umlauts (ü, ö, ä and ß) must be romanised into two-letter combination (ue, oe, ae and ss).
This doesn't make sense in certain cases, for example https://osu.ppy.sh/s/723626 wouldn't make sense as the "Ü" is simply a stylizing choice, it's intended to be a U with some fancy shmancy stuff.

Commas, vs., &, feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.
does this include "ft" or "feat" without a "."

If a song or artist are referred to in multiple ways on official sources provided by the artist, the mapper is free to choose any of the romanisations. The only exception to this is if the song already has a mapset in the Ranked Section, in which case the corresponding guideline applies to it.
How does this apply to https://osu.ppy.sh/p/beatmaplist?q=diao%20ye%20zong which has had maps ranked under "Diao Ye Zong" and "RD-Sounds".

Only use the Source field if the song comes from, is remixed from or specifically fan-made for a video game, movie, or series. Website names are not an acceptable Artist nor Source.
Does this include if the song was popularized by a specific game/movie/series? I beleive that's how it's being handled rn as well.

Guest mappers and storyboarders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributers of any given beatmap set.
Does this need to be extended to previous usernames?

For songs belonging to doujin circles, the circle name must be used over the vocalist or composer, unless these contributors are not part of the circle. In these cases the priority falls on vocalist followed by composer for instrumental songs.
I'm not sure if I'm interpreting this wrong, but shouldn't it be more something like guest composer + circle, to give accurate credit in those cases?


Special unicode characters must be filtered to their nearest standard equivalent or removed from the Romanised Artist and Romanised Title fields within a .osu file. ★ ☆ ⚝ ✩ ✪ ✫ ✬ ✭ 🟉 🟊 ✮ ✯ ✰ and the likes are substituded to an asterisk. Corner Brackets have to be written as quotation marks instead. Other special characters are to be romanised or dropped on case-by-case basis.
I believe there should be some clarification between ' ' and " ", being used for romanization of 「 」, see https://osu.ppy.sh/s/735097, https://osu.ppy.sh/s/538136. Since British and American ways differ.
Sieg

ailv wrote:

Only use the Source field if the song comes from, is remixed from or specifically fan-made for a video game, movie, or series. Website names are not an acceptable Artist nor Source.
Does this include if the song was popularized by a specific game/movie/series? I beleive that's how it's being handled rn as well.
Also, since this is enforced rn, I think there should be some sort of indication that source is a must even for remixes, covers, whatever if original comes from vg, movie, series etc. Because atm it sounds like it's your choice to put it or not. Maybe someone can help with proper wording?
ailv
Oh adding on actually, I think there should be some clarification of what sources are acceptable too, https://osu.ppy.sh/s/729305 would allow both "東方Project" and "東方輝針城 ~ Double Dealing Character."

Something like, if your source is part of a large series, you may use either the specific game, or the series.
Topic Starter
Okoayu

Naotoshi wrote:

agre w/ nevo abt hitsounders in tags.

i think forcing tv size is a bit stupid since "cut ver." for example isn't forced if it's not official yet it serves the exact same purpose. it takes players about 3 seconds to see if a map is tv size or not, and in many cases dropping it makes the title look much cleaner anyway.
that's true, idk what to do about it for now because delis speaking for players has a point imo

Sieg wrote:

alright
Cyrillic Romanisation: Use BGN/PCGN system for Russian/Cyrillic. Е and е should be romanised as ye if it stands alone or after a, e, ё, и, о, у, ы, э, ю, я, й, ъ, ь. In other cases, it should be romanised as e. ё should be romanised to ye, however, use yo or o to avoid usage of special characters. Ignore any other rules in the file provided, these are either irrelevant or wouldn't help in the game. If an artist uses a preferred romanisation, follow it regardless of this rule. For most of the other characters, refer to the first page of this document.
I suggest to remove "Cyrillic" and leave it to case by case since current wording is ambiguous trying to cover all Cyrillic languages with different phonetics and different scenarios of romanisation, which is obviously wrong approach. So, reword it to

Russian Romanisation: Use BGN/PCGN system for Russian. Е and е should be romanised as ye if it stands alone or after a, e, ё, и, о, у, ы, э, ю, я, й, ъ, ь. In other cases, it should be romanised as e. ё should be romanised to ye, however, use yo or o to avoid usage of special characters. Ignore any other rules in the file provided, these are either irrelevant or wouldn't help in the game. If an artist uses a preferred romanisation, follow it regardless of this rule. For most of the other characters, refer to the first page of this document.

J1NX1337 wrote:

Brackets within artist or title fields should be separated from the other text surrounding it, unless there is obvious reason to not do so.
-> unless there is an obvious reason not to do so.
Slight grammar correction.
ok

Kurai wrote:

- Cyrillic Romanisation should follow the BGN/PCGN system (except for the letter ё in Russian which should follow the GOST 2002(B) system). Read more here: http://up.kuraip.net/032209ex3724.pdf
can some russians say something about http://up.kuraip.net/032209ex3724.pdf ? it seems to make sense and encompass all things said

Pachiru wrote:

For tags, from draft: "This is to give credit where credit is due and helping others identify the main contributers of any given beatmap set."

Since we should give feedback to the main contributors of the set, we need to put the person who made the hitsounding, which is an important part of the map, because without hitsounds the map is unrankable. If we give credit to someone who made something optional like Storyboard, we should add the hitsounds person to the tag aswell.

(also, don't forget to fix the typo on: "contributers" → "contributors")
sure. fixing this

ailv wrote:

If a track has more than 5 artists they must be substituted with Various Artists, similarly if a track is composed of 3 or more individual tracks, the title must be substituded to <Descriptor> Compilation.
What about albums? E.g If I were to map https://soundcloud.com/owslaofficial/se ... nctuary-ep the entirety of this ep mixed into one track, I can't see why I'd have to name it "Sanctuary EP Compilation" over just "Sanctuary EP".
suggest an alternative, this is a fair point


Songs with German umlauts (ü, ö, ä and ß) must be romanised into two-letter combination (ue, oe, ae and ss).
This doesn't make sense in certain cases, for example https://osu.ppy.sh/s/723626 wouldn't make sense as the "Ü" is simply a stylizing choice, it's intended to be a U with some fancy shmancy stuff.
limiting it to romanisation of german then because german words are affected

Commas, vs., &, feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.
does this include "ft" or "feat" without a "."
now it does

If a song or artist are referred to in multiple ways on official sources provided by the artist, the mapper is free to choose any of the romanisations. The only exception to this is if the song already has a mapset in the Ranked Section, in which case the corresponding guideline applies to it.
How does this apply to https://osu.ppy.sh/p/beatmaplist?q=diao%20ye%20zong which has had maps ranked under "Diao Ye Zong" and "RD-Sounds".
if the song youre mapping has 2 ranked sets with both then you can go back to choosing, otherwise follow ranked unless that one was wrong

Only use the Source field if the song comes from, is remixed from or specifically fan-made for a video game, movie, or series. Website names are not an acceptable Artist nor Source.
Does this include if the song was popularized by a specific game/movie/series? I beleive that's how it's being handled rn as well.
example needed?

Guest mappers and storyboarders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributers of any given beatmap set.
Does this need to be extended to previous usernames?
does it say so? no.

For songs belonging to doujin circles, the circle name must be used over the vocalist or composer, unless these contributors are not part of the circle. In these cases the priority falls on vocalist followed by composer for instrumental songs.
I'm not sure if I'm interpreting this wrong, but shouldn't it be more something like guest composer + circle, to give accurate credit in those cases?
usually when that happens the song is a guest on an album anyways but im not sure myself more debate required.


Special unicode characters must be filtered to their nearest standard equivalent or removed from the Romanised Artist and Romanised Title fields within a .osu file. ★ ☆ ⚝ ✩ ✪ ✫ ✬ ✭ 🟉 🟊 ✮ ✯ ✰ and the likes are substituded to an asterisk. Corner Brackets have to be written as quotation marks instead. Other special characters are to be romanised or dropped on case-by-case basis.
I believe there should be some clarification between ' ' and " ", being used for romanization of 「 」, see https://osu.ppy.sh/s/735097, https://osu.ppy.sh/s/538136. Since British and American ways differ.
then clarify instead of saying that?
CrystilonZ
The same applies to the Source field if a romanised Source is preferred by the mapper.
I don't believe romanised source is appropriate tbh. Stuff inside the source field should be in its original language since that field does not limit usable characters to only stuff found on normal english keyboards. If the mapper wants the english title inside that field he/she can only do so only if there is an official english title.

https://osu.ppy.sh/forum/p/6549402 << this isn't dead orz. Everyone just got incredibly busy lol. For the current state of that proposal there are some counterarguments for it but most are just pure fallacy or stuff that we've replied to already. I can summarise the thread and post it here if you want to. I'd love to see Chinese romanisation changed.
Topic Starter
Okoayu
updated draft, btw

2 months of no one doing anything is pretty dead
also the source is a unicode field as such it can hold anything we want it to -> the mapper should have the choice to decide which one is shown in the client
ailv

ailv wrote:

If a track has more than 5 artists they must be substituted with Various Artists, similarly if a track is composed of 3 or more individual tracks, the title must be substituded to <Descriptor> Compilation.
What about albums? E.g If I were to map https://soundcloud.com/owslaofficial/se ... nctuary-ep the entirety of this ep mixed into one track, I can't see why I'd have to name it "Sanctuary EP Compilation" over just "Sanctuary EP".
suggest an alternative, this is a fair point

Compilation should be used in cases where the songs are not already part of a organized set of songs.


Songs with German umlauts (ü, ö, ä and ß) must be romanised into two-letter combination (ue, oe, ae and ss).
This doesn't make sense in certain cases, for example https://osu.ppy.sh/s/723626 wouldn't make sense as the "Ü" is simply a stylizing choice, it's intended to be a U with some fancy shmancy stuff.
limiting it to romanisation of german then because german words are affected

Can we state that this appleis speficially to Songs with a German title or artist fields then? Currently the wording implies that any song with those german umlaut characters would need to be romanized as such.

Guest mappers and storyboarders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributers of any given beatmap set.
Does this need to be extended to previous usernames?
does it say so? no.

I've seen discussion about this before, I think I've seen cases where the previous username was added to the online tags so a DQ wouldn't be needed. I think further discussion would be useful.

Special unicode characters must be filtered to their nearest standard equivalent or removed from the Romanised Artist and Romanised Title fields within a .osu file. ★ ☆ ⚝ ✩ ✪ ✫ ✬ ✭ 🟉 🟊 ✮ ✯ ✰ and the likes are substituded to an asterisk. Corner Brackets have to be written as quotation marks instead. Other special characters are to be romanised or dropped on case-by-case basis.
I believe there should be some clarification between ' ' and " ", being used for romanization of 「 」, see https://osu.ppy.sh/s/735097, https://osu.ppy.sh/s/538136. Since British and American ways differ.
then clarify instead of saying that? I don't know which one is better, i'd propose that bot are acceptable and up to choice.
Topic Starter
Okoayu
1 tried to clarify compilations

2 that is what "german metadata" encompasses

3 probably why these maps weren't dq'd

4 yea idk either
Noffy
ok time for a re-review with slightly fresher eyes

the thing part nine wrote:

Word-by-word Romanisation: Each character must be romanised into a single, capitalised, separated word. Refer to this thread for examples and supplementary information.
this isn't the same thread anymore and doesn't include that supplementary information section so having this doesn't make sense oops


the thing part thirty wrote:

Guest mappers, storyboarders, and hitsounders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributors of any given beatmap set.
-> + "Skinners should be added if they made the skin specifically for the mapset" (in contrast to someone just borrowing/mixing skin elements that're already out there) (this would be nice)


the thing part forty two wrote:

Commas, vs., &, any variations of feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.
(CV: blah) vs. ( CV: blah) . the latter would look silly, so CV: shouldn't require leading whitespace either. Or uhhh... this doesn't apply to sides which have the inside of a bracket next to them? or something. since it'd also apply to like, (feat.) vs. ( feat. ) which isn't.. better really.. hmmm
I'm not sure how to fix the wording for this though
aaaaaaaa

Okoratu wrote:

4 yea idk either

Corner Brackets have to be written as quotation marks instead.
->
Corner Brackets have to be written as quotation marks instead, with either the British or American methods unless there is an official source specifying a specific method.


?
may be a bit long for what it's a part of though.


thanks oko~
ailv
If the creator of the mapset has done major edits to the .mp3, they are free to name it appropriately to signal that this song is a special version. In this case the original songs must still be clearly indicated in order for players to be able to search for the original songs.
How exactly would the line for "major edits" be drawn? I think this specific part requires additional discussion. I personally would suggest that a major edit constitutes that a given .mp3 is either edited to remove or add additional instruments? I'm not too sure here.

Corner Brackets have to be written as quotation marks instead.
Add an example of what a corner bracket is "⸤ ⸥" and " 「 」 " since there are multiple forms.

Other special characters are to be romanised or dropped on case-by-case basis.
Will these special characters be added to the rc? If not, I would suggest that as they appear on "case-by-case" they be updated.

Brackets within artist or title fields should be separated from the other text surrounding it, unless there is obvious reason not to do so. Reasoning like this would include syntactical use of brackets and the general typesetting of a song title or artist using them without whitespaces often and consistently across multiple platforms.
Can we clarify what the word "separated" refers to in this context? I think it makes more sense to explicitly state that separation should be done using whitespaces, unless there is an obvious reason to not do so. Otherwise title,[stuff] is technically separated by a comma.
Noffy

ailv wrote:

Corner Brackets have to be written as quotation marks instead.
Add an example of what a corner bracket is "⸤ ⸥" and " 「 」 " since there are multiple forms.
That's why it's in the glossary
ailv

Noffy wrote:

That's why it's in the glossary
oh shit im blind, add "⸤ ⸥" still though.
CrystilonZ

Okoratu wrote:

also the source is a unicode field as such it can hold anything we want it to -> the mapper should have the choice to decide which one is shown in the client
The point is that I don't consider the romanised source as being official. Why don't we just stick to the original language since it's clearly a better alternative? There's no need to romanise it to begin with
Topic Starter
Okoayu
oh there is, if you are english speaking and map japanese songs, the source in the top left ingame is unable to be translated at all to tell english speaking people what it is from without sounding cryptic to them
CrystilonZ
Properly crediting the source should take priority here imo. Like there are measures to increase metadata standards so that it credits stuff properly. Replacing titles (even they are in foreign scripts) with unofficial ones is not the best way to credit the source properly.
Sieg
How about official translations, crunchyroll for example?
Sieg
mhm, I can't see any changes about Cyrillic Romanisation in draft. You said it was updated?

Okoratu wrote:

Kurai wrote:

- Cyrillic Romanisation should follow the BGN/PCGN system (except for the letter ё in Russian which should follow the GOST 2002(B) system). Read more here: http://up.kuraip.net/032209ex3724.pdf
can some russians say something about http://up.kuraip.net/032209ex3724.pdf ? it seems to make sense and encompass all things said
That's fine for Russian, but I don't see a reason to apply replace rules from Russian to Ukrainian as it is in draft rn. As for other Cyrillic based languages - we don't write Romanisation of Hieroglyphs\Chinese so I don't see a reason why this done for Cyrillic.

also

pdf wrote:

Hopefully, the BGN/PCGN systems have been built so that Cyrillic can be rendered by using only the basic letters and punctuation found on English language keyboards.
No, some of Cyrillic based languages uses umlauts for Romanisation in BGN/PCGN.
CrystilonZ
just as third party stuff arent accepted as refs imo crunchyroll shouldn't be accepted as well i guess

for cases like games that are released in a lot of regions english names should be okay
ea. both ポケモン超不思議のダンジョン and Pokémon Super Mystery Dungeon are fine
Sieg

CrystilonZ wrote:

just as third party stuff arent accepted as refs imo crunchyroll shouldn't be accepted as well i guess

for cases like games that are released in a lot of regions english names should be okay
ea. both ポケモン超不思議のダンジョン and Pokémon Super Mystery Dungeon are fine
Can you elaborate why you think this is acceptable for games but e.g. not for anime series?

Crunchyroll wrote:

officially-licensed content from leading Asian media producers directly to viewers translated professionally in multiple languages
CrystilonZ
Titles on crunchyroll are released by a third party company. Licensed or not crunchyroll is still a third party company. This same logic applies to itunes spotify etc. so just keep things to the same standards I guess.
Sieg
well.. localizations for games usually also done by outsourced third party companies, even publishers can differ in regions.
CrystilonZ
Bringing the Mandarin Romanisation up again in this thread. I'll try to summarise every argument made.

Gonna change the wordings a bit and add some stuff to make it more clear

Proposed Rules wrote:

  1. Languages in Chinese language family must be Romanised accordingly. Do not Romanise Cantonese texts with Hanyu Pinyin method of Romanisation.
  2. Songs with Mandarin titles and/or Mandarin artists must use the Hanyu Pinyin method of Romanisation when there is no Romanisation or translation information listed by an official source. The ü vowel should be Romanised into u and all diacritical tone marks should be omitted because of the technical limitations resulting from the limited amount of characters allowed in the Romanised title/artist fields.
  3. For capitalisation and word separation, refer to The Basic Rules of the Chinese Phonetic Alphabet Orthography (汉语拼音正词法基本规则/漢語拼音正詞法基本規則). In short, generally every word should be separated and capitalised. Surname and first name are separated using a space and are capitalised.
  4. Particles (助词) are written separately and should not be capitalised.
our arguments

CrystilonZ wrote:

  1. The one-character-one-word method is impractical. Similar to Japanese, one Chinese character does represent one single syllable. However, a word is not necessarily comprised of one syllable (like Japanese, Chinese is a polysyllabic language). For example 图书馆 (túshūguǎn) as a whole means library, and writing 'li bra ry' would defeat the purpose of Romanisation by not resembling the structure of languages using the Roman alphabet.
  2. Using v as the Romanisation of the vowel ü is nonsense. The purpose of Romanisation is to enable players to read titles / artist names written in scripts that are foreign to them. For anyone that does not know Mandarin and/or how pinyin works, Lv Guang (Lü Guang) is just begging to be read as Level Guang.
  3. The current Romanisation method is baseless and irrational considering the linguistic specifities of the Mandarin language. The current method is based on a discussion comprised of a small number of people only.
The difference in pronunciation of u and ü is acknowledged but Romanising a vowel into something like v is most likely not the best idea.

Wafu wrote:

I think it is worth doing a little comparison the current system and the system in the proposal to highlight the pros a bit more.
  1. Current system
    1. Titles are easy read ✘ (most of people will read every syllable as if it was one word)
    2. Titles are easy to remember ✘ (words are easier to remember than separate syllables, humans remember the words easier by their shape)
    3. Fits the rules of Latin script (Romanisation = writing words from other script to Latin/Roman script) ✘ (Latin script is alphabetical, therefore separating each syllable doesn't make sense and doesn't read well for majority of Latin script)
    4. Fits the rules of Chinese script ✘ (Impossible, if you want to make it "fit" to the Chinese script, you would have to replace each character with one logogram, Latin alphabet doesn't have logograms. Chinese is also not syllabary script, so separating each syllable again doesn't make sense.)
    5. Differentiates between different Romanisations and meanings of the same sequences of characters. ✘
    6. Includes tones in Romanised text ✘ (Impossible with characters which we are limited to. You could use "a1", "a2" (redundant) etc., but that would make the text incomprehensible, majority of people wouldn't even know how to pronounce it)
    7. Doesn't replace characters with others which have no evidence of being similar to the intended character. ✘ (ü is replaced with v, which doesn't seem to be supported by any logical argument)
    8. You can use a different Romanisation system for dialects where the current system wouldn't work at all ✘
    9. Isn't related to politics ✘ (Impossible, picking any Romanisation system is picking a side, every Romanisation system is related to politics)
  2. Proposal
    1. Titles are easy read ✔
    2. Titles are easy to remember ✔
    3. Fits the rules of Latin script (Romanisation = writing words from other script to Latin/Roman script) ✔
    4. Fits the rules of Chinese script ✘ (Impossible, if you want to make it "fit" to the Chinese script, you would have to replace each character with one logogram, Latin alphabet doesn't have logograms.)
    5. Differentiates between different Romanisations and meanings of the same sequences of characters. ✔
    6. Includes tones in Romanised text ✘ (Impossible with characters which we are limited to. You could use "a1", "a2" (redundant) etc., but that would make the text incomprehensible, majority of people wouldn't even know how to pronounce it)
    7. Doesn't replace characters with others based on no evidence that they are similar to the intended character. ✔
    8. You can use a different Romanisation system for dialects where the current system wouldn't work at all ✔
    9. Isn't related to politics ✘ (Impossible, picking any Romanisation system is picking a side, every Romanisation system is related to politics)
So far, no issues that aren't solved or would have to be solved were brought up. We obviously accept your opinions, but it must be to the topic and it must be an actual issue that the system has.

Counterarguments and our replies
Counterarguments are in bold
  1. Mandarin does not equal to Chinese. There are various types of Chinese dialects with different Pin Yin Systems. Sometimes the boundaries between Mandarin and other languages within Chinese language family is vague. You need extra rules to clarify this boundary.
    It's not our job to define language boundary. The boundary is already defined my the language itself. The "Mandarin does not equal to Chinese" was acknowledged since the beginning.

    ===
  2. China Mainland, Taiwan and other regions (maybe) utilizes different kind of systems of romanisation. "Official" in the international standard means P.R.C. Official. Yet the standard from Chinese mainland is not fully utilized in Taiwan. Let alone other regions. It is not like Japanese, which is shared by only one country.
    This is not at all our concern. The Romanisation system we picked is to resemble the way our other Romanisation systems (like Modified Hepburn) read, to ensure more uniformity and readability for a regular player who knows only the Latin alphabet. We don't care what is official somewhere else, because that's not the community this is directed at.

    ===
  3. Spacing each word/idiom is hard to implement. Word-by-Word Romanisation is better because no problem will arise.
    If problems regarding word separation arise (for example there are two possible ways to Romanise a particular phrase/word), a research must be done to determine which way is more preferable like what we do with the Romanisation of Japanese. Using word-by-word Romanistation is not fixing the problem but rather making it worse. Romanising them syllable by syllable will yield the exact same Romanisation for both contexts/meanings. That doesn't only mean the problem is not fixed (the text still may not align with what the meaning is supposed to be), it also means introducing second problem (now, the meaning is completely undetectable).

    ===
  4. Han Yu Pin Yin is also involved within political issues, which of course should be considered if you would like to establish a rule of it.
    This is just pure fallacy.
=======
p/6443024 if anyone wants the full version
Fycho

Proposed Rules wrote:

  1. Songs with Mandarin titles and/or Mandarin artists must use the Hanyu Pinyin method of Romanisation when there is no Romanisation or translation information listed by an official source. The ü vowel should be Romanised into u and all diacritical tone marks should be omitted because of the technical limitations resulting from the limited amount of characters allowed in the Romanised title/artist fields. There is a vowel that is "u", using "u" for "ü" just mess up them, currently there are not better choice other than "v". And only at very a few cases, it may use 'yu' for people name, and it's arguable, but I am not able to consider this is suitable to represent all the "ü"
Below are copied from my post from originally thread, this is some explanation and fact about Chinese:

Chinese is pictograph which is different from phonogram like English. Romanising “我的未来式” to "Wo de Weilaishi" wouldn't be better reading or having more meaning than "Wo De Wei Lai Shi". When we read "Weilaishi", we have to spend time splitting the word, and translate them to characters in our minds, which has no difference from "Wo De Wei Lai Shi", both of them are not intuitive except "我的未来式" because Latin letters are used as marking, they don't stand for meaning to read, and as don't expect Chinese speakers read the Pinyin/other-type-romanisation articles, that's impossible and unreadable. Therefore, using English as example (eg, simple okay => si mp le o k ay') is meaningless. Also, "Wo de Weilaishi" wouldn't be easier to search songs than "Wo De Wei Lai Shi", players would spend more time to think how to search songs, because most title are hard to find a way to be divided into words (although they use chracters to search at most time).


Below are that I disagree: (blue are my replies)

Wafu wrote:

  1. Current system
    1. Titles are easy read ✘ (most of people will read every syllable as if it was one word)
    2. Titles are easy to remember ✘ (words are easier to remember than separate syllables, humans remember the words easier by their shape)
    3. Fits the rules of Latin script (Romanisation = writing words from other script to Latin/Roman script) ✘ (Latin script is alphabetical, therefore separating each syllable doesn't make sense and doesn't read well for majority of Latin script)
    4. Fits the rules of Chinese script ✘ (Impossible, if you want to make it "fit" to the Chinese script, you would have to replace each character with one logogram, Latin alphabet doesn't have logograms. Chinese is also not syllabary script, so separating each syllable again doesn't make sense.)
    5. Differentiates between different Romanisations and meanings of the same sequences of characters. ✘
    6. Includes tones in Romanised text ✘ (Impossible with characters which we are limited to. You could use "a1", "a2" (redundant) etc., but that would make the text incomprehensible, majority of people wouldn't even know how to pronounce it)
    7. Doesn't replace characters with others which have no evidence of being similar to the intended character. ✘ (ü is replaced with v, which doesn't seem to be supported by any logical argument)
    8. You can use a different Romanisation system for dialects where the current system wouldn't work at all ✘
    9. Isn't related to politics ✘ (Impossible, picking any Romanisation system is picking a side, every Romanisation system is related to politics)
  2. Proposal
    1. Titles are easy read ✔ sorry but as a Chinese speaker I don't think they are not easier to read than current seperated romanisation format
    2. Titles are easy to remember ✔ They are not easier to remember than current seperated romanisation from a Chinese speaker side as well
    3. Fits the rules of Latin script (Romanisation = writing words from other script to Latin/Roman script) ✔Latin script isn't fittable to Chinese, Pinyin system is much better from my side too
    4. Fits the rules of Chinese script ✘ (Impossible, if you want to make it "fit" to the Chinese script, you would have to replace each character with one logogram, Latin alphabet doesn't have logograms.)
    5. Differentiates between different Romanisations and meanings of the same sequences of characters. ✔explained above, for Chinese characters, Pinyin or any other romanisation system doesn't have any meaning at all, it's just a way to use as mark for Mandarin
    6. Includes tones in Romanised text ✘ (Impossible with characters which we are limited to. You could use "a1", "a2" (redundant) etc., but that would make the text incomprehensible, majority of people wouldn't even know how to pronounce it)
    7. Doesn't replace characters with others based on no evidence that they are similar to the intended character. ✔ as the pinyin or anyother romanisation/latin letters (they are just used as mark as I said)doesn't stand for meaning, this feels unnecessary
    8. You can use a different Romanisation system for dialects where the current system wouldn't work at all ✔ Dialects don't have an official way of writing formally, even in HK, schools teach the Mandarin grammar and write standard Chinese grammar while only use cantonese as a pronuciation. There wouldn't be songs that use dialects as song titile and artist, so they don't need to be romanised at any case.
    9. Isn't related to politics ✘ (Impossible, picking any Romanisation system is picking a side, every Romanisation system is related to politics)
CrystilonZ
ü is a vowel in pinyin system therefore should be replaced with a vowel. If I am a typical english speaking person how am I going to pronounce stuff like Lv? If you believe there is a better alternative I really want to hear it. imo changing a vowel into a consonant just renders phrases cryptic.

I am aware that Chinese characters are logograms and languages that use the Latin alphabet are phonograms. However, Romanisation is transcribing languages in other scripts in Latin script and thus should follow grammatical rules of languages that use the Latin script: spaces separate words; not syllables. Mandarin is polysyllabic (to people that think that it's monosyllabic - f u); some words contain more than one syllable.
I don't believe players will spend more time to search for Mandarin songs when the new Romanisation rules have been applied. Chineses probably use Chinese characters to search for stuff. However, for the majority of osu players who don't speak chinese the new rules make titles feel more familiar because they are categorised into words, like English, and should spend less time searching for a particular map.
If you say both "Wo de Weilaishi" and "Wo De Wei Lai Shi" are pretty much the same for Chinese people as they are both counter-intuitive. Why do you oppose the proposed rules then? They might not improve stuff for the Chinese but for the english speaking players they will make the Romanised titles much more appealing.

Fycho wrote:

Below are that I disagree: (blue are my replies)

Wafu wrote:

Proposal
  1. Titles are easy read ✔ sorry but as a Chinese speaker I don't think they are not easier to read than current seperated romanisation form As stated above they are easier for the majority of the player base and they are not harder to read for the Chinese. That means no cons just pros.
  2. Titles are easy to remember ✔ They are not easier to remember than current seperated romanisation from a Chinese Speaker side as well
  3. Fits the rules of Latin script (Romanisation = writing words from other script to Latin/Roman script) ✔Latin script isn't fittable to Chinese, Pinyin system is much better from my side too
  4. Fits the rules of Chinese script ✘ (Impossible, if you want to make it "fit" to the Chinese script, you would have to replace each character with one logogram, Latin alphabet doesn't have logograms.)
  5. Differentiates between different Romanisations and meanings of the same sequences of characters. ✔explained above, for Chinese characters, Pinyin or any other romanisation system doesn't have any meaning at all, it's just a way to use as mark for Mandarin I think the point here is about homophones and stuff. Same sequence of pronunciation might produce two (or more) different meanings. Separating Romanised titles into words should help with the comprehensibility.
  6. Includes tones in Romanised text ✘ (Impossible with characters which we are limited to. You could use "a1", "a2" (redundant) etc., but that would make the text incomprehensible, majority of people wouldn't even know how to pronounce it)
  7. Doesn't replace characters with others based on no evidence that they are similar to the intended character. ✔ as the pinyin or anyother romanisation/latin letters (they are just used as mark as I said)doesn't stand for meaning, this feels unnecessary speaking about u and v here. v is just impossible to pronounce. I'm always open for a better alternative.
  8. You can use a different Romanisation system for dialects where the current system wouldn't work at all ✔ This is only about Mandarin, dialects are not included. (Cantonese, Wu-Chinese(Shanghainese, Suzhou Hua, Wuxi , HangZhou), Jiang–Huai Mandarin, Southern Fujian Dialect, Hakka Dialect, etc don't need to be discussed for now this is speaking about the new rule. Languages in Chinese language family must be Romanised accordingly. This opens room for other Chinese languages to use suitable Romanisation systems, not restricting all Chinese languages to one bad system which may or may not fit the language.
  9. Isn't related to politics ✘ (Impossible, picking any Romanisation system is picking a side, every Romanisation system is related to politics)
Fycho
Romanising "ü" to "u" completely messes up things.
吕 => Lü
鲁 => Lu
雨 => Yu
currently there aren't better choice other than "v", and "v" is the way that major input keyboard use.

CrystilonZ wrote:

I think the point here is about homophones and stuff. Same sequence of pronunciation might produce two (or more) different meanings. Separating Romanised titles into words should help with the comprehensibility.
Romanisation provides no meanings, both "Wei Lai Shi" and "Weilaishi" are just a mark of "未来式"(logograms), in the meaning, there is no different between "Wei Lai Shi" and "Weilaishi". "Weilaishi" doesn't help the comprehensibility for both people who speak Chinese and don't speak Chinese. For Chinese people needs to spend time switching them to characters, for non-Chinese people, it's just mark/pronunciation of "未来式", they don't have the meaning. They could use "Wei Lai Shi" to search, they don't speak Chinese how can they know "Weilaishi" is a whole word?

Since the romanization of Chinese(include dialects) is much more complex than other languages, I think it's better to have a good knowledge/research of them before revising old rules and doing a proposal.
Hollow Wings
OK, what a mess.

just wanna warning: my post will be long.
check it as detail as you can to know about Chinese language and its romanisation, if you wanna get involved into this.

A. Important things about Chinese Romanisation

I. "ISO 7098:2015".
1st thing of all, know things about ISO 7098:2015 as much as you can.
ISO 7098:2015 explains the principles of the Romanization of Modern Chinese Putonghua (Mandarin Chinese), the official language of the People's Republic of China as defined in the Directives for the Promotion of Putonghua, promulgated on 1956-02-06 by the State Council of China. This International Standard can be applied in documentation of bibliographies, catalogues, indices, toponymic lists, etc.
all contents in this document are important, you may know some before. and there's two parts i wanna specially mention for you, they are like:
1. In automatic romanizing working progress, there're two ways for Chinese Romanisation:
a. semi-automatic romanisation from Chinese words separated by following proper rules.
b. automatic romanisation from Chinese characters one by one.
2. During this period of time, most of other countries aside of PRC can't fully accept that romanizing Chinese characters into separated words according to combinations between Chinese characters, because the works of finding and dealing with the concept of Chinese words are complex, also the grammar of Chinese sentence can even blur it.
after thousand of thoughts, they decide to do the romanization work from Chinese characters one by one.


↑ this is my opening, just mark it and go on.


II. How special Chinese is as a kind of language.

according to the way characters comprise words, languages can be divided into alphabetic language and ideographic language, with alphabet and ideogram as their own characters.

a. alphabetic language is simple, most of you can easily know its concept. also, most of languages exist now, are alphabetic language. they are comprised with proper alphabet of their own. as i known:
  1. Cyrillic alphabet (eg. Russian)
  2. Hebrew characters (eg. Hebrew)
  3. Arabic alphabet (eg. Arabic)
  4. Armenian character (eg. Armenian)
  5. Georgian character (eg. Georgian)
  6. Old Geez abjad (eg. Old Geez) ←already dead
  7. Devanagari script (eg. Sanskrit)
  8. Tamil alphabet (eg. Tamil)
  9. Kana script (eg. Japanese)
  10. Hangul script (eg. Korean)
  11. Thai script (eg. Thai)
  12. Tibetan script (eg. Tibetan)
  13. Mongolian script (eg. Mongolian)
... and tons of other alphabetic languages which may not be widely used or just dead.
b. ideographic language is like, every single character was born from some exact thing or matter, this is very different from alphabetic language.
however, as i known, language that is ideographic language are:
  1. Egyptian hieroglyphs (eg. Ancient Egyptian) ←already dead
  2. Cuneiform script (eg. Ancient Sumerian) ←already dead
  3. Seal hieroglyphs (eg. Ancient Indian) ←already dead
  4. Maya hieroglyphs (eg. Ancient Mayan) ←already dead
  5. Chinese characters (eg. Chinese)
and NO MORE.
if you want to know why language system is like that, then that's a long story, i wont start telling them here.
the reason i pick up those truth above, is because i want you guys know the chinese language's specificity and leading to how different romanisation is done between alphabetic language and ideographic language.


III. “Transliteration” and “Transcription”
1. still, let's see what the most important 12 international transliteration standards aside of Chiniese's are:
  1. ISO 9-1995: Information and Documentation: Transliteration of Cyrillic characters into Latin characters – Slavic and non-Slavic languages
  2. ISO 233-1984: Documentation: Transliteration of Arabic characters into Latin Characters
  3. ISO 233-2-1993: Information and Documentation: Transliteration of Arabic characters into Latin characters – Part 2: Arabic language – Simplified transliteration
  4. ISO 233-2-1999: Information and Documentation: Transliteration of Arabic characters into Latin characters – Part 3: Persian language – Simplified transliteration
  5. ISO 259-1984: Information and Documentation: Transliteration of Hebrew characters into Latin characters
  6. ISO 259-2-1994: Information and Documentation: Transliteration of Hebrew characters into Latin characters--Part 2: Simplified transliteration
  7. ISO 3602-1989: Documentation: Romanization of Japanese (kana script)
  8. ISO 9984-1996: Information and Documentation: Transliteration of Georgian character into Latin characters
  9. ISO 9985-1996: Information and Documentation: Transliteration of Armenian characters into Latin characters
  10. ISO 11940-1998: Information and Documentation: Transliteration of Thai
  11. ISO 15919-2001: Information and Documentation: Transliteration of Devanagari and related Indic characters into Latin characters
  12. ISO TR 11941-1996: Information and Documentation: Transliteration of Korean scripts into Latin characters
see? these nearly contained all of alphabetic language i've mentioned before.
and because of that, transliteration between their scripts and Latin characters can be easily done, no matter which one's character set is larger.
and again, because of that, retransliteration can be done easily as well. also, this is the basic rule of transliteration.

for example: Cyrillic word "окружающая среда" (means "envirment") can be directly conversed into "okruzhayushchaya sreda" with its proper transliteration rule:
о→o
к→k
р→r
у→u
ж→zh
а→a
ю→yu
щ→shch (even used four Latin characters to make sure there's no various meaning)
а→a
я→ya
с→s
р→r
е→e
д→d
а→a
↑really simple right? just do automatic transliteration and all works will be perfect.
conversely, if you see "okruzhayushchaya sreda" in Latin characters, you can do transliteration that make it "окружающая среда" with no trouble. the transliteration is reversible.
follow the rule and i can do this even i know nothing about Cyrillic characters or Russian.
like i see "обстановка" i can transliterate it into "obstanovka" directly, even thou i have no idea what that word means.

this situation is also perfect match to all transliteration works between Latin characters and other alphabetic language.

2. now, let's see Chinese.
remember the word "envirment"?
in Chinese, it's "环境“.
now tell me, how can you transliterate it into Latin characters, even if you know the transliteration rule and Pinyin system very well?

the deep reason of the transliteration work can be easily done between Latin characters and other alphabetic language, is that their character set is really small.
there're 26 Latin characters in total.
and there're 38 Cyrillic alphabet in total.
it's easy to do the mathematic mapping between them (even using 4 words like "shch" for "щ") and build an easy rule for transliteration system.

how much Chinese characters are there?
- at least 80 thousand. and still as much as 8 thousand frequently used ones.
because it's a kind of, or i want to say, the only living ideographic language.

so what?
so, that effects romanisation very much.
it has a completely different level of buiding a romanisation rule to what alphabetic languages do.
Transliteration won't do, we need to do "Transcription".
when we do transcription from Chinese characters into Latin characters, we need to use Pinyin system to help us.
there're 405 syllable, so yeah, we can finally do it, with similar rules as alphabetic languages did:
it's easy to transcript Chinese characters' pingyin into Latin characters.
like "环境" reads "Huan Jing" (i decide to get rid of phonetic symbols for pinyin for now, before it become more complex.), then that's the exact Latin version of that Chinese word.

however, this is not reversible.
for example, if i see "Huan Jing" in Latin characters for pinyin, i can't transcript into Chinese.
i don't know if it is "环境" or "幻境" or "幻镜" or whatever other thousands of possible meanings.
all Latin characters of pinyin will occur that, and it means all of them can have various meanings.

if it's a sentence, the situation will be worse.
for example: "Wu Huan Jing Ran Zhe Me Yuan", which has "Huan Jing" in it.
but it's chinese is: "五环竟然这么远", which means "Fifth ring road is unexpectly far from here"... which has 0 connection to "环境“(envirment).

we chinese ourselves even cant understand what those words said in a short time, if they are all written in Latin characters of pinyin one by one.
and this just mess this whole system up.

there're already lots of chinese language specialists noticed this, and all i've written above are all old age conversations.
they already gave a solve: do romanisation from Chinese words separated by following proper rules.
like, if i met "环境", i transcript it into "huanjing".
thou there're still lots of varity meaning, words can be clearly recognized in a Latin character line.
we can easily pick up two or three Latin characters of pinyin which can be combined as a Chinese word, that helped reading the sentence a lot.

wonderful right?
not really...

basic Chinese grammar is simple, just like English maybe.
but, for Chinese language's own ideographic language property: every characters, and their combinations of words, and interchanges/flexible uses happened among them, can make all those meanings different.
and what's more, that just produced a lot works of dealing "what is proper Chinese words".

examples to show how hard transcription into separated words method can be:
a. "他好说话" in Chinese, its pinyin written one by one is "Ta Hao Shuo Hua".
one version of meaning: "他 好说话" means "he is an easy going person", and the separated version of transcription is "Ta Haoshuohua".
another version of meaning:" 他 好 说话" means "he is volubility", and the separated version of transcription is "Ta Hao Shuohua".
b. "他谁都赢不了“ in Chinese, its pinyin written one by one is "Ta Shui Dou Ying Bu Liao".
one version of meaning: "他 谁都赢不了" means "he can beat nobody", and one version of meaning: "他 谁 都赢不了" means "nobody can beat him". sadly, i don't even know how to transcript that sentence into correct Latin characters of pinyin, before i studied deep into the The Basic Rules of the Chinese Phonetic Alphabet Orthography (汉语拼音正词法基本规则) or other rules like that..
and by the way, not all of that rule is solid. rules of pinyin usually changes to fit more special situations.

that shows how complex we gonna deal with Chinese words:
it's already complex enough to have those sentence understood, it'll drive we people crazy if you ask them to romanize it with words separated.
like if i saw "五环境内": i'm gonna get rid of my intuition with the obvious word "环境"(enviroment) and analyse the sentence; then i know it should be regarded as "五环 境内"(within fifth ring road precinct); then finally i output the result "Wuhuan Jingnei". this is already sick, even with 4 simple Chinese characters.
if i saw things like "阴晴圆缺", ”七里香", "非常道" which has vague concept in various Chinese language system (i'll mention this later in detail), i'll easily be mad if someone ask me romanize it.
i'm sure most Chinese CAN'T do this very well, and i think other foreigners will be worse at it for sure.

although the method of transcription from Chinese separated words helps people read Chinese sentences easier, it's a really really tough work to do that transcription.
besides, there's no official standards of transcription yet. that orthography of rules just help people do it, but will not automatically do it.

(what's more discouraging, is that Chinese words sometimes dont have exact meaning.
that'll be further complex, i'll just stop here.)

all those truth above stated that: transcription of Chiniese characters into Latin characters is a really tired and tough work to do. it cost lots of dedication and time, and required rich reserve of Chinese language knowledge, which not much of people can do.
it's all because Chinese is a kind of ideographic language, you need to know the exact meaning of every morpheme by analysing the whole sentence before you separate those characters into exact words, if you really want separated Latin characters after transcription.



B. Relation to osu community nomination system

CrystilonZ wrote:

Other languages that use the Chinese script are irrelevant to this proposal.
We are only talking about Standard Mandarin here and Mandarin is not equivalent to Chinese.
We only use 'Chinese' in the draft for simplicity. The wording will be changed if this is implemented.
↑i don't know if CrystilonZ know the whole Chinese language family clear enough, so i'll add some additional things as basic background knowledge here.
ISO 639 code sets
Documentation for ISO 639 identifier: zho
Identifier: zho
Name: Chinese
Status: Active
Code sets: 639-2/T and 639-3
Equivalents: 639-1: zh
639-2/B: chi
Scope: Macrolanguage
Type: Living
Denotation: See corresponding entry in Ethnologue.
The individual languages within this macrolanguage are
  1. Gan Chinese [gan] → 赣语
  2. Hakka Chinese [hak] → 客家话
  3. Huizhou Chinese [czh] → 惠州话
  4. Jinyu Chinese [cjy] → 晋语
  5. Literary Chinese [lzh] → 文言文
  6. Mandarin Chinese [cmn] → 官话(普通话)
  7. Min Bei Chinese [mnp] → 闽北话
  8. Min Dong Chinese [cdo] → 闽东话
  9. Min Nan Chinese [nan] → 闽南话
  10. Min Zhong Chinese [czo] → 闽中话
  11. Pu-Xian Chinese [cpx] → 莆仙话
  12. Wu Chinese [wuu] → 吴语
  13. Xiang Chinese [hsn] → 湘语
  14. Yue Chinese [yue] → 粤语
ok, so, things above are just for electric area. there're still lots of other native language in PRC.
and i just don't post PRC's official native language list here, in case make things more complex.

since people like CrystilonZ may insist that Mandarin Chinese is the main target and other Chinese systems have none business with it, let's start from the concept level of "macrolanguage":
it actually has a property of "same standard pronunciation and style of writing".
and to Chinease as the macrolanguage, its standard, is just Mandarin Chinese.


so the truth is, all Chinese language families DO has a common standard, and also with hundreds and thousands of connection to it. when you are talking about some other Chinese family menbers, it always be effected by Mandarin system, which is the exact center of the whole topic.
if you wanna get rid of every other Chinese language families, then you need to give another complete romanisation rule, to solve some problems may happened in transcription process. otherwise, Mandarin Chinese's is automatically an official solving way. in case of that, be shall be care about this one's effection to other Chinese language families.

and also, the so called "Cantonese" is actually a concept of "languages spoken in Guangdong Province“, contained "Min Zhong Chinese", "Hakka Chinese“ and "Yue Chinese". people just usually use its narrow sense of concept: almost regard "Cantonese" as "Yue Chinese".
what's more, native language spoken in Taiwan is a kind of Min Nan Chinese, in case some ignorant one jumps out.


with all those knowledges above, we can move on:

I. How to deal with Mandarin Chinese transcription with words from other Chinese language families, but also already became a part of it?

1. Chinese archaism

it's a part of Literary Chinese, but also become a part of Mandarin Chinese.
some of them even changed meaning, and it's hard to distinguish.
if Literary Chinese is regarded as another individual language aside of Mandarin Chinese, then when meet words like "空穴来风", "闭门造车", "人尽可夫", etc, how to deal with these?

2. multi-Chinese based songs

for example, there's a Chinese song called "好心分手", one of its version is sang by both Yue Chinese and Mandarin Chinese.
so Yue Chinese romanized version is "Hou Sam Fan Sau/Housam Fansou" (actually this is jupting, a special kind of pinyin)
and Mandarin Chinese romanized version is "Hao Xin Fen Shou/Haoxin Fenshou".
both of them are spoken exactly correct, then how to deal with these?

3. with Chinese families that no romanisation rules supported
for example, there's a Chinese song called "外滩18号", which is sang by three kind of Chinese language: Mandarin Chinese, Wu Chinese and "Southwestern Hakka" (an official native Chinese language of PRC).
so it can be romanized like:
Mandarin Chinese: "Wai Tan Shi Ba Hao/Waitan Shibahao"
Wu Chinese: "Nga Thae Tze Ba O/Ngathae Tzebao"
Southwestern Hakka Chinese: "Vai Tan Si Ba Hao/Vaitan Sibahao"
i'm not sure if those ones are correct (just typed here with searching dictionary of native romanisation) aside of Mandarin ones, but it can still have chance to have the romanisation of their own part, right?
then how to deal with these?


II. Even if we shall transcript Mandarin Chinese from separated words into Latin characters, who is the one help those mappers mapping a Chinese song?

it has some part:
  1. is this a Mandarin Chinese song?
    - maybe from official settings or sites, not a big deal. but will not do if you map some cult song.
  2. how to get the right romanized characters?
    - ask some Chinese staff/mapper/player? i doult any of them have time/ability to do it.
  3. how to make sure those things i got is correct?
    - some kind of same as the one above, if that person exsist and can do his job endlessly, he will be really welcomed to this system.
you may think most of Chinese words may not complex like that, but if you wanna build a reasonable system for rules, it should be strict.
and it's not you become the person who do this kind of work, you can hardly imagine if it's hard to do it or not.


C. Summary

I. Opinions

1. even international level groups can't do lots of romanisation for Mandarin-Latin transcription from separated words.
it's feasible, for it's truth. but it's efficiency is really really badly low.
Chinese staffs will be weary/tired out to death if they really do this. because as you see what i've explained, it's a tough work with a tough progress to do.

also i even can predict that someone wanna find a right answer of correct Mandarin romanisaton for month, and still dqed after he found the answer he got is still wrong. then it may block people mapping Chinese songs, personally i think that's really a bad news.

2. Mandarin Chinese and Cantonese has standard romanisation rules, but not other Chinese families. it's hard to complete one of you don't care all of them, for every single one of them has a common standard pronunciation and style of writing: Mandarin Chinese.

in case of that, rebuilding the Mandarin Chinese romanisation system in to a better and complete one will be a really hard work to do, and it's for sure out of osu community's range.

3. Chinese osu community already argued this for several times long time ago, and the result is still: keep the current state.

II. Conclusion

do romanisation from one by one Mandarin Chinese characters is the best way SO FAR.
until we find some genius invent a dictionary of Mandarin-Chinese-characters-Latin-characters romanisation, and upgrade the efficiency a lot more than current one.
and also, this is the exact thing what international groups do right now. (they only combine proper nouns like people's or place's name, etc.)

--------------

simple extra p.s. here:
to CrystilonZ, and other people who know little things about Chinese:

i think you had some wrong idea about Chinese characters, for i've seen written these:

CrystilonZ wrote:

Similar to Japanese, one Chinese character does represent one single syllable. However, a word is not necessarily comprised of one syllable (like Japanese, Chinese is a polysyllabic language).For example 图书馆 (túshūguǎn) as a whole means library, and writing 'li bra ry' would defeat the purpose of Romanisation by not resembling the structure of languages using the Roman alphabet.
Chinese is far different from Japanese. the syllable thing you are talking about may be just the differences between Japanese's Hiragana or Katagana, but not that true for Kanji part.
(btw, you may already know that a part of Japanese language system is just the exact Chinese.)

and now after reading all things i wrote above, you may know Chinese is not only a kind of polysyllabic language, but also the only living ideographic language.
"图书馆" reads "tú shū guǎn" and means "library", true.
However, "图书" reads "tú shū" and means "library book" or just “(picture) book", you ever know that?
this is far different from that you can't separate an English word in most cases: but you DO can separate a Chinese word, because every single character of Chinese can be a word.
eg.
图→graph, graphic, or lots of other meanings;
书→book, writing, letter, or lots of other meanings;
馆→shop, embassy, galleries or any building that showing something it wants to.

so, the one-character-one-word method is a solid reasonable metod for Chinese romanisation.

with knowledge of these, hope you can restructure your idea about Chinese, for helping you understand previous romanisation part.

--------------


hope all of these things could help you know more about Chinese romanisation.

also if you have any confusion about anything above, you are always welcomed to ask.
Mafumafu
Regarding the Romanisation of Mandarin, I would like to post my comments here.

Firstly I would like to start with the following proposal:

Proposed Rules wrote:

The ü vowel should be Romanised into u and all diacritical tone marks should be omitted because of the technical limitations resulting from the limited amount of characters allowed in the Romanised title/artist fields.

CrystilonZ wrote:

speaking about u and v here. v is just impossible to pronounce. I'm always open for a better alternative.
Please understand first, if you want to change the current rule, namely from ü to u, you have to prove yourself FIRST u is a better choice than v, instead of announcing you are going to change it to “u” while asking us to provide a better choice. There are plenty of letters and characters could be chosen, why you chose u? Just because they look similar after omitting the you called “diacritical tone mark”? I don’t think that is a reliable reason for this change as only judging by visual appearance is pretty unprofessional when talking about romanization. Additionally, Fycho has already mentioned the potential mess that might result from changing v to u, indicating that this entry within the proposal is not only pros. Therefore, prior to this discussion, you should not simply saying “The ü vowel should be Romanised into u…” and explain this change only by why “ü” cannot be implemented by the current system due to technical difficulties but to explain why “u” is better than “v” with valid reason, ( “u” can be pronounced is not a valid reason: there are many characters that could be pronounced, like a e I o and some bi-characters like yu, which is mentioned by Fycho. All of them have pros and cons, why do you gave preference to u in thisdraft?), as well as how you are going to address potential problems if this “u” proposal is implemented.

Again, if you would like to change the current criteria, try to form up solid reasons and show people why your proposal is better than the current. Saying “I am going to change this into that, if you don’t have better choices then this will be the new criteria.” sounds pretty irrelevant, illogic, and showing kind of manipulation toward criteria about Romanization of Mandarin.

I would like to proceed to comparison between current and proposed system in the previous discussion:

Previous Discussion wrote:

  1. Current system
    1. Titles are easy read ✘ (most of people will read every syllable as if it was one word)
    2. Titles are easy to remember ✘ (words are easier to remember than separate syllables, humans remember the words easier by their shape)
  2. Proposal
    1. Titles are easy read ✔
    2. Titles are easy to remember ✔
I don’t think with the proposal, titles are easier to read and remember.
How do you expect speakers who don’t know how to pronounce “ü“, “v” and “u” to differentiate syllables and words under Romanisation of Mandarin?

For non-Mandarin speakers, there are no differences regarding readability between “Wo De Wei Lai Shi” and “Wo de Weilaishi” or any other combinations like “Wode Weilai Shi”. They have no idea what is a syllable and what is a word. If you think words are easier to remember (you did not post any proof or research regarding this either), why can’t a player treat the syllables as words? Now that the player have no idea what you are reading is word or syllable. There are less syllables than words in total, they should be more easier to read and memorize!

Additionally, about a post in previous discussion thread:

Previous Discussion wrote:

For my thoughts on the matter I don't understand how romanising 学不会 to Xue Bu Hui is more informative than Xue Buhui or Xuebuhui. Word separation can be ambiguous at times but whether you write Xue Buhui or Xuebuhui it's more informative than the Xue Bu Hui according to the current RC.
I think the current Romanizing method is more informative. At least, they are equal regarding informative from the Latin language-wide. Let me raise an example:

Current you have a title like this:
best pro po sal e ver
And you admit that word separation can be ambiguous at times, so we could actually have those with the proposed criteria:

Bestpro posale ver, or
Bestpro po salever, or even
Bestproposalever

Though they seem illogic in English, yet they are both possible when it comes to Mandarin.
Back to Xue Bu Hui, in Mandarin we don’t mark them in words. It is written as 学不会 in which 学 stands for Xue, 不 stands for Bu and 会 stands for Hui. No marks are used to define words like 学[不会] Xue Buhui or [学不会] Xuebuhui because either has a unique and reasonable meaning. Thus readers could understand the multi-layer meaning by themselves with flexibility regarding different kind of meanings. If you would like to choose Xue Buhui or Xuebuhui, you are misleading player to only one type of meaning, providing narrower range of information.

Also, I don’t see any feasible clarifications or methodologies of “separating with words” within this draft. The references linked are only providing insufficient examples, which is far from being enough and completion.

Previous Discussion wrote:

  1. Fits the rules of Chinese script ✘ (Impossible, if you want to make it "fit" to the Chinese script, you would have to replace each character with one logogram, Latin alphabet doesn't have logograms.)
  2. Differentiates between different Romanisations and meanings of the same sequences of characters. ✔
  3. Doesn't replace characters with others based on no evidence that they are similar to the intended character. ✔
These statements are also problematic.
First, In formal Chinese writing, there is no logograms as well. And split syllables when romanization does fit the rules of Chinese script, as I’ve mentioned above: Syllables in Chinese also own meaning, or rather, syllables and words are not mutual exclusive. And also, the proposed method failed to differentiate between different romanizations either. Nor could the proposed change could resolve the third issue listed, as mentioned above.

So the table of proposal in fact should be modified like this:
  1. Proposal
    1. Titles are easy read ✘
    2. Titles are easy to remember ✘
    3. Fits the rules of Chinese script ✘
    4. Differentiates between different Romanisations and meanings of the same sequences of characters. ✘
    5. Includes tones in Romanised text ✘
    6. Doesn't replace characters with others based on no evidence that they are similar to the intended character. ✘
    7. Isn't related to politics ✘
    8. Easy for non-Mandarin speakers/players to search in the beatmap list. ✘
    9. Avoid mislead players about the meaning of title. ✘
    10. Provide a practical or feasible method of romanization. ✘
CrystilonZ
holy molly what the heck
ok I'll try to answer stuff to my best capabilities.
I don't know why there are so many points brought up because basically there are only 2 changes.
1. Stuff are now categorised into words not syllables.
2. v --> u (changed)
Other than that we just made the rules more clear and more standardised.

and that's pretty much it. Any other problems that you guys mention are still there even though there are no changes made to the RC. If you want to bring up other Chinese languages as well please be informed that this new proposal addresses other Chinese languages better than the old one. The new proposed rules make sure texts in Hanzi script are not overgeneralised and get the appropriate Romanisation (ea. you can use Jyutping or whatever for Cantonese stuff). As stated in the old thread we are trying to propose a better Romanisation system, not a perfect one. Though this new proposal does not address all the problems in the world, I have firm belief that it's better than the old one.

Tbh I haven't seen any arguments supporting v as an alternative of ü except "v is used as the input for ü on most keyboards" which is not very sensible to be used as a reason here.
Here are my reasons for why u is better than v as a substitution of ü.
1. In the pinyin system ü is pronounced with /y/ kinda like the germanic ü. Germanic umlauts are romanised with two-letter equivalents (ue for ü). However stuff like lüe exists in the pinyin system and if it were to be romanised with the same two-letter equivalent the result would be luee which is nonsensical.
2. u that pronounced with ü exists in the pinyin system already. Stuff like xuan are pronounced with the ü vowel, not u. Though this is limited to j, x and q.
3. How the heck are nv or lv pronounced (do not pronounce nü or lü instead)? It's basically impossible. Can't even represent them with IPA stuff.

However substituting ü with yu might also be a great alternative because it has all the same pros that u does over v (stuff like yue exist as well) and additionally it doesn't change both nü and nu into nu. So if there are no further problems regarding pronunciation and ambiguity arise, I'm going to revise the proposal.
  1. Songs with Mandarin titles and/or Mandarin artists must use the Hanyu Pinyin method of Romanisation when there is no Romanisation or translation information listed by an official source. The ü vowel should be substituted with yu and all diacritical tone marks should be omitted because of the technical limitations resulting from the limited amount of characters allowed in the Romanised title/artist fields.
With backing reasons as follows
  1. Replacing ü with yu exists in the pinyin system already. Yue (ea. 月) is pronounced like üe.
  2. Nyu lyu and the likes can be pronounced by a normal english speaking person and the pronunciation is, though not ideal, quite close to the actual ü.
  3. Replacing ü with yu is seen in practical use among Chinese people as well.
  4. The substituted Romanised texts can be easily traced back to the original pinyin (with umlauts) and don't cause any ambiguity.
I hope this new substitution satisfies both parties.

[]

Moving on to the replying stuff. Feel free to correct me if I'm wrong.

Hollow Wings wrote:

1. In automatic romanizing working progress, there're two ways for Chinese Romanisation:
a. semi-automatic romanisation from Chinese words separated by following proper rules.
b. automatic romanisation from Chinese characters one by one.
2. During this period of time, most of other countries aside of PRC can't fully accept that romanizing Chinese characters into separated words according to combinations between Chinese characters, because the works of finding and dealing with the concept of Chinese words are complex, also the grammar of Chinese sentence can even blur it.
after thousand of thoughts, they decide to do the romanization work from Chinese characters one by one.
For point 1. I just don't see how this is related to our discussion. " In automatic romanizing working progress"
2. Can you quote the exact words from the document? also all the reasons as stated in the standard as well. I couldn't read it while working on the proposal because 115 swiss franc is hella expensive.

Hollow Wings wrote:

  1. Egyptian hieroglyphs (eg. Ancient Egyptian) ←already dead
  2. Cuneiform script (eg. Ancient Sumerian) ←already dead
  3. Seal hieroglyphs (eg. Ancient Indian) ←already dead
  4. Maya hieroglyphs (eg. Ancient Mayan) ←already dead
  5. Chinese characters (eg. Chinese)
and NO MORE.
if you want to know why language system is like that, then that's a long story, i wont start telling them here.
the reason i pick up those truth above, is because i want you guys know the chinese language's specificity and leading to how different romanisation is done between alphabetic language and ideographic language.
Read more about ideograms here. These are logograms. Modern Chinese characters are logographic.

A number of lines after this are about pinyin being a method of transcription. No comments there this is acknowledged since the beginning that this is just the way to pronounce stuff. And the next few lines are about Mandarin having a lot of homophones.

Hollow Wings wrote:

however, this is not reversible.
This is not exactly true. If it were Mandarin would have been dead a long while ago because the only way to communicate would be carrying a crap ton of paper with you at all time and write stuff when you want to communicate.
In English context it would be equivalent to you guys seeing or hearing /tīm/ (IPA stuff. This reads time). Intuitively the first thing that come into your heads would be the time. Tick-tock clocky stuff. However under different contexts:
"Can you buy me some /tīm/. I'm going to use it to cook dinner." In this case /tīm/ is the herb thyme.
"I don't have enough /tīm/ to do my homework. It's due tomorrow." In this case it's "time"
"Two /tīm/ two equal four." In this context it means multiply. 10/10 grammar.
As you can see they are reversible with context. And when you guys speak to each other you're actively tracing back to the original Hanzi characters using their pronunciation. Therefore, saying that it is not reversible is not true. It's harder in Mandarin (410 syllables - crap tons of words. Do the maths) but the fact that there are people speaking Mandarin proves the fact that it's possible.

[]

Gonna stop here for now as it's getting really late. Further replies will be given by Wafu or me. Whoever that gets some free time will reply more to other stuff.
abraker
Any thoughts about mapping style or patterns the maps have being in tags?
Shima Rin

abraker wrote:

Any thoughts about mapping style or patterns the maps have being in tags?
Don't you see that you are in the wrong topic.
Fycho
The main arguments are listed below:
  1. If we romanise Chinese title in word-by-word way(each character must be romanised into a single, capitalised, separated word) or generally every word should be separated and capitalised according to The Basic Rules of the Chinese Phonetic Alphabet Orthography.
  2. If using "yu" or "u" for the romanisation of the vowel "ü".
  3. If we need to distinguish dialects from Mandarin in romanisation.
For the first point, I recommend everybody has a read about ISO7098:2015 before sharing opinions, the romanisation of Chinese is much complex than others, which needs a lot of professional knowledges about Chinese. The new proposal can't stand “a word or phrase with double or more meanings”. For example, specific examples like "他谁都打不过", it's used intentionally to represent two meanings that are "Nobody can beat him" and "He can beat everybody", "Ta / Shui / Dou Da Bu Guo" and "Ta / Shui Dou Da Bu Guo". And it wouldn't be easier to be remember / read to Chinese / non-Chinese speakers. I am not going for detail, as someone would like to give more professional explanations.

For the second point, currently, "v" stands a lot. "ü" is one-word vowel, it works differently in pronunciation from two-words-vowel like "iu", "an", "ie", "üe", "ai", "ao", etc... We use "YU" for "ü" only in passport and other specific cases, because the passport require a captial letter about the name and "ü" doesn't have captial case. In other ways, there are still "v". For one-word vowel, "v" is the most common and familiar letter and it's officially supprted, and that is what the input keyboard uses in majority. I believe using "yu" for "ü" only makes it easier to read than "v" for non-Chinese speakers, but it's technically wrong, there aren't any other beneficial cases. The "u" of syllable "yu" is vowel "ü" actually and technically, but for "j / q / x / y / w", we use "u" for "ü", but it doesn't mean "u" can completely stand for "ü", and don't mean it's "yu" can stand for "ü", "y" isn't a vowel in Pinyin system at all, "y" is a consonant that has the same pronunciation as vowel "i", meanwhile "iu" and "yu" are completely two different things. In the pinyin system, "vowels" couldn't be made from "consonant". That means, By no means could "yu" become a two-word vowel, and could "yu" be used for romanisation which disobeying the language systems totally. "v" works best at the moment.

For the third point, is it necessary to distinguish dialects from Mandarin in romanisation. As all of us know, dialects are different in pronunciation, and some have different grammars. However, all the dialects don't have an official written format, and all the dialects do have a relation with Mandarin. A lot of Chinese characters words that are stand by all the dialects, like "好心分手", you can't know if it's Mandarin or Wu-Chinese or Cantonese unless someone pronounces it, but officially and technically we can't differ and figure out what it is, and it's just modern standard Chinese, and we romanise it in a standard way. Personally, I am a dialect-used person, and I can speak Wu-Chinese and Mandarin well. The major issue is there aren't any official way that we can write the dialect. This is because, It's not like the Japanese dialects, Japanese (Hirakana, Katakana) are same as lantin scripts, which are phonograms, however Chinese characters are ideographic and ideogram, this mades Chinese characters can't be used to represent the pronunciation to dialects, and decides that there wouldn't be any officially written form dialects, and there wouldn't be any song title that writes as dialects. There aren't any official published ways to romanise the pronuciation of dialects. Therfore it's unnecessary to distinguish dialects from Mandarin. By the way, if you are likely to say cantonese(Yue-Chinese), there isn't any official written form for cantonese as well, and in HK and Macau, the school teaches the standard Chinese written form, people personally like to type Yue-Characters in cantonese, which is more like a culture. It's not taught by the school officially. Enforcing something unofficial just makes us end up with endless discussions, that's why there isn't any official romanisation way until now, because we have already argued a lot in the real world, and haven't come out a conclusion. How can we romanise an independent language that even doesn't have a written format? I believe this is beyond out of the osu! community, and it's unnecessary to figure them out at the moment.

Current way of romanisation is a fair relatively way, which covers all the cases, and remain as a good result. It's not the best, but it's the most proper.
I've asked some Chinese-spoken QAT/GMT (Nardoxyribonucleic, spboxer3 and Zero__wind) for opinions about the proposal, and all of them think it's not necessary to revise the current romanisation rules about Chinese.
show more
Please sign in to reply.

New reply