forum

[Proposal] Metadata section overhaul

posted
Total Posts
216
show more
Mafumafu

Naotoshi wrote:

because literally 0 western readers (who the romanization is aimed at) will read v as a vowel...................................................................................



I think it is still better than misleading players to pronunce wrong.
Youmu Chan

peppy wrote:

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

using "v" should not even be considered, so please do not even consider it. if native people are offended, they can turn off roman display.

i believe "yu" is the only correct answer here.
https://soundcloud.com/gloriorbelli/in-paradisvm
Here is a song with title in Latin using v as vowel which is in accordance to Oxford way of typing Latin. IF some day someone maps this song, are you indicating that we shall change the name to In Paradisum instead? I don't think that makes sense, so why do you oppose "v" in the first place?
Akanagi

Emilia wrote:

i see no real reason for "v" to not be used because its already so established to chinese players.

peppy wrote:

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

using "v" should not even be considered, so please do not even consider it. if native people are offended, they can turn off roman display.
Again, romanisation was never aimed at native speakers in the first place, so whether it is estabilished and comfortable to chinese players shouldn't really matter here.
CXu

peppy wrote:

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

using "v" should not even be considered, so please do not even consider it. if native people are offended, they can turn off roman display.

i believe "yu" is the only correct answer here.
Eh, I think there's more than just pronounciation that's important here. There will be confusion if a non-native speaker tries to talk about a song with a native speaker, if the choice of romanization is different from what native speakers are used to. It's kind of like if we were to transliterate "llamo" to "yamo" for Spanish songs, to closer represent the actual sound.

We also already have cases where we're not as accurate as possible, such as the use of "x", or where "yi" really just sounds like i. Japanese romanization also differentiate between ei and ee, oo and ou, even though they sound the same.

In my opinion pronounciation should just be one priority off many when it comes to romanization, and other factors such as ambiguity should be taken into account, and if possible, should stay as close to the original language where possible.

As for using v specifically: letters in general tend to have different pronounciations in different languages already, so I feel like when someone sees the v and have trouble saying it, they'll realise it's probably pronounced differently.


@Rayne: Well, it doesn't need to, but wouldn't it be more convenient if native speakers and non-native speakers were on the same page when writing song titles to each other? In this case, yu isn't actually the right pronounciation anyway, so it doesn't really have any added benefit over v, other than teaching people how to pronounce it wrong. If you have to learn the pronounciation anyway, why not go with the one native speakers are using already, and that leads to less ambiguity? That's how I see it anyway.
Wafu

Youmu Chan wrote:

peppy wrote:

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

using "v" should not even be considered, so please do not even consider it. if native people are offended, they can turn off roman display.

i believe "yu" is the only correct answer here.
https://soundcloud.com/gloriorbelli/in-paradisvm
Here is a song with title in Latin using v as vowel which is in accordance to Oxford way of typing Latin. IF some day someone maps this song, are you indicating that we shall change the name to In Paradisum instead? I don't think that makes sense, so why do you oppose "v" in the first place?
No, because it's the way of writing Old/Classical/... Latin which doesn't need Romanisation. If we want to establish a rule about Romanisation, we simply find which Romanisation system is the best, keep the Latin script characters as is and figure out how to replace the special characters that we can't use. And it has to be based on something that objectively makes sense to a regular player who knows nothing about Romanisation or that language.

"v" in pinyin is has no basis other than what you press on the keyboard to write it, it could as well be "k" because you would technically be pressing "k" on Dvorak keyboard layout. Either one makes no sense, because regular user has no chance of knowing that in pinyin input method, this would actually produce ü.

We should differentiate Latin and Latin script without mixing it much. In any case, even if we considered that "v" was (u was added to Latin in 16th century) pronounced "u" in Latin language, ü still doesn't sound like either "v" or "u".
Youmu Chan

Wafu wrote:

No, because it's the way of writing Old/Classical/... Latin which doesn't need Romanisation. If we want to establish a rule about Romanisation, we simply find which Romanisation system is the best, keep the Latin script characters as is and figure out how to replace the special characters that we can't use. And it has to be based on something that objectively makes sense to a regular player who knows nothing about Romanisation or that language.

"v" in pinyin is has no basis other than what you press on the keyboard to write it, it could as well be "k" because you would technically be pressing "k" on Dvorak keyboard layout. Either one makes no sense, because regular user has no chance of knowing that in pinyin input method, this would actually produce ü.

We should differentiate Latin and Latin script without mixing it much. In any case, even if we considered that "v" was (u was added to Latin in 16th century) pronounced "u" in Latin language, ü still doesn't sound like either "v" or "u".
To me this is saying

Original language uses Latin script while it only makes sense to native speaker and makes totally no sense on pronunciation to non-native speaker (Polish, Latin for example): OK
Original language doesn't use Latin, so Romanize it into something that only makes sense to native speaker and makes little sense on pronunciation to non-native speaker (Using v in Chinese Romanization): NOT OK
Original language doesn't use Latin, so Romanize it into something that makes no sense to native speaker and makes some sense (but not accurate) on pronunciation to non-native speaker (Using yu in Chinese Romanization): OK

I am now completely confused by the intention and philosophy behind the metadata system
Kagetsu
in my opinion, whatever thing chinese people use to input characters should be used, this way you avoid people having trouble figuring out how to romanize their own language (like when you change wo for o in japanese)

we should avoid middle grounds where neither native nor foreigners understand the romanization process fully
Simuzax
Do people realize that players will most likely mispronounce it anyways, even more if they dont have any previous knowledge of the language?

peppy wrote:

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.
That's completely true, the problem is that romanization isnt accurate, mandarin has multiple tones and meanings for words that sound exactly the same to a non-native, romanization is just supposed to help you refer to something to other non-natives, the way you pronounce something doesnt really matter as long as the other person understands what youre trying to say

Honestly, if ü isnt sonorously close to either yu, yi, v or u, just use u instead since it would auto-correct if you were to search it on google or whatever and then do the tags thing that kroytz suggested for easier search in-game and on the website

Also, why dont we just add like a button to the website so we can see what the original, non-romanized title is?
CrystilonZ
Hi
I highly suggest everyone here read both of these https://osu.ppy.sh/forum/p/6557643 https://osu.ppy.sh/forum/p/6557415
I'll elaborate and add some points here.

Fycho wrote:

yu:
English speakers would read it "yoo", which has different tone from "ü" My description of the vowel ü is like this: firstly shape your mouth like you're going to pronounce the word "you," but instead of a vowel that sounds like oo (in moo) do a ee sound (like in bee or he) instead. Do not change your mouth's shape
asked around a bit and Nyu is pronounced like nee-yooh (like a very exaggerated new). The first portion gets the sound right and the second portion get the mouth shape right. Just sharing information lol this is no relation with stuff down there.

My opinion about this is that Romanised texts should be kind of familiar-ish to any english speaking person and give a rough hint of pronouncing the actual sound. However, the pronunciation needs not be perfectly accurate all the time and pronunciation does vary a little bit depending on the speaker's mothertongue. Here are some examples:

河 ------> Hé (pinyin) ------> He (osu!)
The "He" here is not pronounced like English he (with an e sound like be or see). It's pronounced like the e in words like "her" or French "le".

筆 ------> bǐ (pinyin) ------> bi (osu!)
This is actually pronounced like a hybrid of bee and pee. B in the pinyin system is an unaspirated p (spit)

If you have zero knowledge about Chinese of course you're probably going to get a bunch of pronunciation messed up unfortunately(I did too lmfao ask fycho).
But that does not mean we should give people something they can't even pronounce. It's unsettling and will probably be really weird to many people, which should not be a feature of any Romanised text.

also this is mentioned in Fycho's post but it seems like it's skipped over by most.
The current method of substituting ü is based on the system used in Chinese passports to Romanise people's names. This method focuses on the pronunciation because customs needs to read people's name. (also Ü can't be printed for some reason).

Any questions can be directed at me or Fycho.

Linguists and native speakers don't be mad at me pls I know the pronunciation is not exact either and I suck at linguistic stuff sorry
Mentai

Fycho wrote:

If saying "v" couldn't be readed by foreigners and makes misconception, then we probably need to rework the Japanese rule as ra / ri / ru / re / ro are actually pronounced as la / li / lu / le / lo in Japanese, which is kinda unfriendly towards those latin scripts users who don't know Japanese. English speakers will pronounce "ra" differently from how it's supposed to be pronounced in Japanese.
this is technically wrong, Japanese uses a pretty happy marriage of both "r" and l"" consonants, using the full mouth formation for "r", but also pressing the tongue very slightly on the roof of the mouth, making a soft 'l". since the full formation "r" is used, it more so correlates with English "r" than "l", and Japanese people (varying on dialect, of course) will recognize this, even through the imperfection of those specific characters by westerners.

regardless, that cannot be said about "v" in Chinese. i have almost no background in Chinese, but i can at least assure using a consonant sound that has nowhere near the same mouth formation/articulation of the sound it is actually trying to produce does not work well. there has to be an unfortunate compromise between perceived transliteration, and vocal articulation.

i don't know what the solution would be, as again, i don't have any Chinese background, but going with the options that are based on actual vowels would work better than "v" in general
Monstrata

Regraz wrote:

Naotoshi wrote:

because literally 0 western readers (who the romanization is aimed at) will read v as a vowel...................................................................................

I think it is still better than misleading players to pronunce wrong.

v also misleads players to pronounce the vowel wrong. Actually it misleads them a lot more than "yu" would for non-native speakers.

Lv = Lü for Chinese speakers

Lv = Uhlv for English speakers (think, Revolve) / or Luhvuh (think, Olive

I would use phonetic transcription but then only people who can read IPA can comment so I went with pronunciation and example instead.
LwL
How about using "yu" for the romanized song title but mandating to put the version with "v" in the tags? That way no one would struggle to search songs if they're searching with roman input (since both ways work), non-chinese speakers would have a pronouncable title, and reading it might look slightly weird to chinese speakers but still clear as to what is meant (from what I gathered from this thread, I don't speak a single word of chinese)?
Shad0w1and

LwL wrote:

How about using "yu" for the romanized song title but mandating to put the version with "v" in the tags? That way no one would struggle to search songs if they're searching with roman input (since both ways work), non-chinese speakers would have a pronouncable title, and reading it might look slightly weird to chinese speakers but still clear as to what is meant (from what I gathered from this thread, I don't speak a single word of chinese)?
even that's the case, the thing should be put on the actual title should still be the lv not lyu. face the problem, even after the romanization, without knowing the language, almost no English speaker can pronounce name from other languages. Romanization is not a way to help English speakers pronounce it but help them learn it. And why do you expect to make an osu standard (which no one will accept it outside of osu) to mislead Chinese learners to find the song?????
Lets say you find a ranked song title "nv ren (woman) hua (flower)", you will try to search on google "nv ren hua", ok you got the result. Then you wanna search for "nyu ren hua", nope, you will find nothing. In this sense, I will accept Nu and Lu for Nv and Lv, because you can still search for them and get the result, but definitely not Nyu lmao.
And the problem still exists, the Lu and Lv sounds totally different, if you wanna make the song title readable for and Chinese and Chinese learner, you have to go with Lv Nv, not Lu Nu, and because other than these two solutions, you can't even find the song, other options (like Nyu) should be automatically ignored.

Please, everyone in this thread think about the consequence of your proposal, most of them it not even viable just because they will fuck everyone up and no one will be able to tell which song is it. If after the meta rule change, no one can use osu meta to find songs, then why do we want that change????
SupaJuke
From a person who mainly uses English even though I have the minimum knowledge to correctly pronounce pinyin including "ü", using "v" to represent "ü" is disastrous and shall be avoid. Here is why:

  1. Any persons who do not know Chinese will eventually mispronounce this "COMPLETELY".
  2. Even though "yu" does also lead into mispronunciation, HOWEVER, it still at least makes, even if slightly, more sense for most non-Chinese speakers.


Also, just as Peppy has already said,

peppy wrote:

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

...if native people are offended, they can turn off roman display.

I will not bring the fact that he clearly supports the "yu"'s side, but only what I want to point out here.

ROMANIZATION is NOT for native speakers. As a Thai, if for any reasons I'm trying to search for a Thai song, I wouldn't bother using a single Latin characters. Instead, I would rather use Thai characters. I believe that Using characters in the desired language is certainly easier than trying to use Latin characters. Henceforth, this should also apply to native speakers(in this case are Chineses) when trying to search for something WHICH IS AVAILABLE IN THEIR OWN LANGUAGES (again, Chinese for this debate).

If you were to ask me if I'm offended when a Thai song is misinterpreted due to romanization, I would answer "no" without hesitation, and I strongly hope ALL NATIVE SPEAKERS feel the same when they see their own languages being misinterpreted.

Still, if native speakers are feeling offended due to their language being mispronounced or whatsoever, they have the option to turn off romanization which has also already been mentioned by Peppy.

PS. The fact that I keep mentioning Chinese isn't because I hold my grudge against them, but due to Chinese being the main discussed topic here. Being honest, I feel like all native speakers should feel the same toward "romanization".
Mentai

SupaJuke wrote:

ROMANIZATION is NOT for native speakers.
essentially this
Shad0w1and
ROMANIZATION is NOT for English speakers to pronounce.

^^^^essentially this

you guys are saying to teach osu player wrong things about Chinese and will make no way for them to find the song by title. Realize it.
Mentai
Romanization (also spelled romanisation: see spelling differences), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.
it actually is for Latin alphabet speaker's to pronounce, not Chinese, unfortunately;
Shad0w1and

Mentai wrote:

Romanization (also spelled romanisation: see spelling differences), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

it actually is for Latin alphabet speaker's to pronounce, not Chinese, unfortunately;
and if a player want to find the song by romanized title, ask google do they accept this proposal :)
SupaJuke

Shad0w1and wrote:

ROMANIZATION is NOT for English speakers to pronounce.

^^^^essentially this

you guys are saying to teach osu player wrong things about Chinese and will make them find no way to find the song. Realize it.


For this to eventually be solved, we have to discuss whether we want to write the metadata according to how it is "pronounced", or "used".

If you want it to be written according to how it is actually used by natives (Chinese), then I wouldn't argue that certainly "v" is better.

However, from my point of view, a non-native speaker, I prefer the metadata to be something readable for me. Even if it would be different from how native speakers actually use when typing/writing in roman.
Mentai

Shad0w1and wrote:

Mentai wrote:

Romanization (also spelled romanisation: see spelling differences), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

it actually is for Latin alphabet speaker's to pronounce, not Chinese, unfortunately;
and if a player want to find the song by romanized title, ask google do they accept this proposal :)
this is literally the issue at hand. the point is that we are valuing actual articulation higher than transliteration. there is a solution for both. having the metadata be the phonetic articulation for Latin alphabet speakers, but also requiring to have the exact Chinese transliteration in the tags. and this would work, but we don't know what to use other than "v." if we could find a happy middle ground for pronunciation between us (that being, non-native Chinese speakers and native Chinese speakers), then the problem would be resolved both ways

please be a bit open minded in respect to how vowels are treated in Latin-based languages
Shad0w1and

SupaJuke wrote:

Shad0w1and wrote:

ROMANIZATION is NOT for English speakers to pronounce.

^^^^essentially this

you guys are saying to teach osu player wrong things about Chinese and will make them find no way to find the song. Realize it.
For this to eventually be solved, we have to discuss whether we want to write the metadata according to how it is "pronounced", or "used".

If you want it to be written according to how it is actually used by natives (Chinese), then I wouldn't argue that certainly "v" is better.

However, from my point of view, a non-native speaker, I prefer the metadata to be something readable for me. Even if it would be different from how native speakers actually use when typing/writing in roman.
idc about native language or not, I am saying you cannot just make a standard which no one outside of osu will accept. that will just fuck up everyone who wants to know about the song on google. I personally is fine with Lv write as Lu because it is searchable. but Lyu is like a joke. no one will now know which song is it.

at the end, I can't see other options than these two:
Title: Nv Ren Hua, Tags: Nu
Title: Nu Ren Hua, Tags: Nv
Other options are not viable, just because it won't help to identify the song in any database or search engine.
Mentai

Shad0w1and wrote:

at the end, I can't see other options than these two:
Title: Nv Ren Hua, Tags: Nu
Title: Nu Ren Hua, Tags: Nv
Other options are not viable, just because it won't help to identify the song in any database or search engine.
if this is actually viable, i think the second option would be a perfect compromise between articulation, and allowing for the full transliterated string to be found
Wafu

Shad0w1and wrote:

idc about native language or not, I am saying you cannot just make a standard which no one outside of osu will accept. that will just fuck up everyone who wants to know about the song on google. I personally is fine with Lv write as Lu because it is searchable. but Lyu is like a joke. no one will now know which song is it.

at the end, I can't see other options than these two:
Title: Nv Ren Hua, Tags: Nu
Title: Nu Ren Hua, Tags: Nv
Other options are not viable, just because it won't help to identify the song in any database or search engine.
Actually, some songs that are not really famous, that had to be Romanised by osu! community alone, can't be found by using the Romanised name. Why don't you just copy the original title and search for that? It would actually be best if the original title was available on the website, but you can still find any song as the original title must be here.

And I'm not sure what is wrong with your Google results, because it seems to have found the songs properly using the "yu" version, at least on my end. But even then, don't forget that you still can add artist to the search. The probability that "yu" will be in all of the characters (of both artist and title) is very improbable and copying both artist and the title should find the song even if one character is missing. For example this map can be found by just typing "stefanie guang", not even her full name is required. I think you're making it look more problematic than it is.
CXu
du ju rilli vant piipell tu vrait hauv dei pronauns vørds in inglisj in anådder længguej?

I mean, probably not. We don't transcribe how we pronounce things from one Latin-Alphabet using language to English, and I'm sure people of native English would have a headache reading my "norwegianization" of an Englsh sentence. It's probably a similar feeling many Chinese people are having about this change when they already have a system they use that uses Latin characters (thus making people able to write and communicate which song they mean) properly that works fine except for the one pronounciation of v, which is already a common problem of any other language than English using the Latin Alphabet.
peppy
The amount of time being thrown at one character when there's obviously a divide in decision which is *not* going to be resolved is insane.

What's the biggest site on the internet that shows romanisation? Wikipedia? Let's use what they use and call it a day.


It doesn't matter which we choose if we're going for conformity. People will get used to it.

Let's stop this and copy the most settled upon solution elsewhere on the internet.

I will be removing any posts that aren't references to other sites using romanisation in a specific way. Let's find the most popular method and use it.
Mafumafu
I guess most players in osu! are using keyboard to input and search so I just put link here about the most popular method via keyboard, not by writing on papers or others.

peppy wrote:

Let's find the most popular method and use it.
Okay then simple answer: v is the most popular amongst players.

Wikipedia wrote:

Since the letter "v" is unused in Mandarin pinyin, it is universally used as an alias for ü.
( https://en.wikipedia.org/wiki/Pinyin_input_method )

Also one more link:
https://eastasiastudent.net/china/manda ... yin-input/
peppy
I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.
Nyquill
If we can go back to something close to a romanization standard for "reading" from the library of congress, they choose to use the double dotted u, which is reasonable. It'll be kind of like placing accents on french words.

https://www.loc.gov/catdir/cpso/romanization/chinese.pdf

The most readable would be this. Maybe we can do something in the search engine side to automatically let both be searched for with u and v. This is a quality of life improvement for non-chinese speakers. You would otherwise be forced to copy and paste to search.

...or we can accept romanization doesn't work perfectly for many languages including chinese, and just use the normal/most popular/most sane latin standard (v) like we would for many other languages.
Fycho
What Nyquill and CXu post do make sense here, romanization doesn't work perfectly for many language, the double dotted u is even kept until now. In the old latin systems "v" and "u" are interchangable. I believe "v" is the most appropriate and popular choice. In youtube search system, "Lv" is obviously better keyword than "Lyu"(try search "Lv Guang"). Using the English name of citzen in passport is one-sided (passport are mostly customs, while our romansation are for players who use computer everyday)

Let's try to fix the proposal again:


Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.

The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.
CrystilonZ
Wikipedia uses ü is most cases because it has no limitation about umlauts.
The only place where Romanised names are displayed and have the same umlaut limitation is the passport which is basically like this

Fycho wrote:

Lü, Nü, Lüe, and Nüe must be substituted with Lyu, Nyu, Lue, and Nue respectively.
https://en.wikipedia.org/wiki/Lü_(surname) << on wikipedia about Lü, a common surname. Read under the Romanization section.
(in Chinese) Standards set in 2012 about Romanization of Chinese names in passports.
吕女绿 (Lü Nü Lü) should be Romanised as LYU NYU LYU
Ephemeral
if that passport romanisation is actually the case, then that's enough precedent set for the use of "yu" over "v", it would seem

"v" is not and will not ever be a valid transliteration in english for this particular because its sound is not really approaching "vuh" or "vee". "yu" is closer to the actual overall sound ("yuU") - romanising by IME precedent is ugly in a number of cases even if it is technically easier to search for
Shiguma

peppy wrote:

I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.
Except this whole debate is because there is no definite way people romanize ü. If there was, this debate would have been over centuries ago. You talk about display on websites, but every website will display it as ü.

The best solution for this would to be to allow ü in the romanization field. Why aren't we allowed to use accented characters in that field in the first place? There are more scenarios besides the Chinese ü where this becomes a problem. (Example: https://osu.ppy.sh/beatmapsets/740535#osu/1562308 Only reason this needed to be romanized is because of the umlaut a, as the artist's name is Mäe, but really we should have just had Mäe without a need for romanization)

peppy wrote:

It doesn't matter which we choose if we're going for conformity. People will get used to it.

Let's stop this and copy the most settled upon solution elsewhere on the internet.
Please make it easy for us to use accented characters in the metadata then. The solution is there, but it requires the staff's help, honestly.

Ephemeral wrote:

if that passport romanisation is actually the case, then that's enough precedent set for the use of "yu" over "v", it would seem



"v" is not and will not ever be a valid transliteration in english for this particular because its sound is not really approaching "vuh" or "vee". "yu" is closer to the actual overall sound ("yuU") - romanising by IME precedent is ugly in a number of cases even if it is technically easier to search for
You bring up passports, but that is the only scenario where "yu" is used. Chinese passports have used "v" as well, and it is up to the passport holder if they want to keep it as-is or change it to "yu" The only reason this is a thing is because they can't use ü on a passport, but really the easiest solution would be to update their system to allow ü on passports.

Seriously, the best solution is to just allow ü and other accented characters in the romanization field. Basically all websites use ü, there is no reason for us to be stuck in this debate when typing ü on a computer is so easy.
Topic Starter
Okoratu
:D cleaning up open points in progress

open talking points


  • Chinese:
  1. i'm not willing to link a 4 year old thread on the ranking criteria. either it should be ported to the wiki or that sentence dropped or its information condensed down into more guidelines on the topic
  2. Hanyu Pinyin system needs an external reference if available, someone please provide me a link
Cyrillic:
  1. Cyricllic is now undefined - anyone fancy coming up with some definition if the current thing only works for russian?
Korean:
  1. Input from Koreans as to which standard is used and should be used going forward is needed
Thai
  1. Input from natives required as to which standard is to be used
Arabic
  1. do we currently need this? I have a hard time telling how much monstrata trolls in this post
TV Size label
  1. FORCE the same way or FORCE DROPPING the label?
will update with issues i have as i go through the thread updating open points

Monstrata wrote:

With respects to Korean romanization, I'm wondering if we should continue applying the McCune-Reischauer system for romanizing Korean. This is the system that the Library of Congress is using. Nyquill brought up an excellent point about using romanization systems that other large institutions are currently using and it works a lot better than creating our own modified system in most cases (unless we are simplifying).

I'm bringing this up because there is also the Revised Romanization of Hangeul system that was introduced on July 7th, 2000 which has been applied to various Korean road signs transportations etc... The major change of course being that the new system eliminates diacritics in favor of digraphs.

A possible rule would look like:

Songs with Korean metadata must be romanised using the McCune-Reischauer system for romanizing Korean when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally, we could introduce the use of digraphs and two-vowel letters into the proposal:

Vowels /ʌ/ and ㅡ/ɯ/ should be written as digraphs in Korean romanization, and romanized to eo and eu respectively.

Another language to examine is Thai. The Library of Congress recommends nine additional rules for Thai romanization which are:

Library of Congress wrote:

Romanization
1. Tonal marks are not romanized.
2. The symbol ฯ indicates omission and is shown in romanization by “ … ” the conventional sign for
ellipsis.
3. When the repeat symbol ๆ is used, the syllable is repeated in romanization.
4. The symbol ฯลฯ is romanized Ia.
5. Thai consonants are sometimes purely consonantal and sometimes followed by an inherent vowel
romanized o, a, or ǭ depending on the pronunciation as determined from an authoritative
dictionary, such as the Royal Institute's latest edition (1999).
6. Silent consonants, with their accompanying vowels, if any, are not romanized.
7. When the pronunciation requires one consonant to serve a double function – at the end of
one syllable and the beginning of the next – it is romanized twice according to the
respective values.
8. The numerals are: ๐ (0), ๑ (1), ๒ (2), ๓ (3), ๔ (4), ๕ (5), ๖ (6), ๗ (7), ๘ (8), and ๙ (9).
9. In Thai, words are not written separately. In romanization, however, text is divided into words
according to the guidelines provided in Word Division below.
The two rules I am proposing are:

Songs with Thai metadata must be romanised using the Library of Congress system (also known as ISO 11940) for romanizing Thai when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

and

In the romanization of Thai, words should be romanized separately, and separated by a space. Additionally, all words should (or should not?) be uppercased.

Attached are helpful transcription keys for Thai:






Another language that is becoming more and more relevant is Arabic, and there are some issues I would like to bring forth with regards to its romanization.

Here is the table for romanization of Arabic:


As you can see, some issues come up. In the romanization of ص ص ص ص for example, (whether initial, Medial, Final, or Alone) the romanization becomes " ṣ" however, the diacritical mark is not something that can be used by osu because it is still not unicode. I would like to propose that all of these diacritical "," attached to letters be removed for the sake of simplicity and because osu currently does not support them. Therefore something like " ص◌نضوِ◌خ" should be romanized as "sandwich".

Another problem with Arabic is that it is typed in reverse, right to left. Should we also apply this to romanization? In this case "ص◌نضوِ◌خ" would actually be romanized as "hciwdnas" when read left to right as English readers are expected to do.

The rule I am proposing is:

Songs with Arabic metadata must be romanised using the Library of Congress system for romanizing Arabic when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally:

In the romanization of Arabic, words should be romanized in verse order, and the last letter should be be uppercased. For example in romanizing "◌ س◌ !" the correct romanization should be "!usO"

However, there is also the problem of Judeo-Arabic romanization which differs slightly from traditional Arabic romanization. Judeo-Arabic of course, stems from the Jewish Arabs many who live in Iraq and have adopted a slightly different script with respect to certain nouns and verbs. The most common Jewish Arabs are those from Baghdad. Anyways, I digress.

Attached are examples of Judei-Arabic romanziation:


So I would like to propose the following:

Songs with Judeo-Arabic metadata must be romanised using the Library of Congress system for romanizing Judeo-Arabic where Judeo-Arabic nouns and verbs are being used, and where there is no romanisation or translation information listed by a reputable source. Where Judeo-Arabic words and phrases are not used, traditional Arabic romanization will apply. The same applies to the Source field if a romanised Source is preferred by the mapper.
i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?

-----------

Applied Sieg changes to russian as they were.

draft updated, refer to open talking points

Wafu

Okoratu wrote:

i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?
1. I agree this may not be very necessary. I'd say we should only require usage of certain Romanisation method only if the language needs to be Romanised repetitively, not once per history of the game.

2. In my opinion (and this is probably only about opinion), drop it and recommend it to tags. If we use a universal label, I feel like it goes completely against what the artist has chosen. If it is removed, it's not really misrepresenting the artist's choice, it is just us not considering it as a part of the title. Not sure if this makes much sense to anyone else, that's just what I think. Edit: Another important point is that, because people like 1-2 min. songs, they will be encouraged to use the in-game song length sorting/grouping (unfortunately can't be done on website). Most people probably search for short songs just by using "tv size", which can omit a large portion of maps they could actually enjoy. (There's nothing magical about TV Size songs, the only thing is that they are ~1 min 30 sec, it's just the length that players care about.) As people would get used to it, it could actually be more beneficial. (afaik Ephemeral (sorry if it wasn't you, but I think it was) said he would rename all the songs that are any kind of TV Size, so that it's consistent, so people actually would get used to it)

Chinese: Couldn't really find a reference that would not be misleading. They are usually too much concerned about the pronunciation part and don't really tell you how to Romanise it. Plus, many of them use the originally proposed style (not syllable by syllable, but "word" by "word"). Many are paid or blocked by government too for some reason. Someone may need to search a bit more to represent the currently intended system better.

Cyrillic: Wouldn't really care that much for Cyrillic as it's improbable that there will be many songs in it. But it could be recommended to use the BGN/PCGN systems for other Cyrillic. The reason being pretty simple, it's nearly the same as the Russian one. Yes, it is intended majorly for geographic names, but it was explained in past that it was the most common way we see Cyrillic Romanised and is the most compatible with modified Hepburn, which is the most spread Romanisation in osu!. It also avoided the confusion between "ch" and "kh", between "j" and "y", between "c" and "ts" etc. Could elaborate this a bit more if needed.

Korean: I believe the currently used system is "Revised Romanisation of Korean", at least it looks pretty much like it in the maps here. It's a South Korean standard (probably the one we should follow as the probability of North Korean songs being mapped here is very low). Didn't find an official document about it, but this seems like a nice reference. It explains all the things you need for Romanisation. Except for Hanja, but again, the probability of something with Hanja being mapped is low. May need input from Koreans, I haven't seen any problems with the current system.

That's my opinion at this moment.
CrystilonZ
Delete word-by-word lol

Sites on the Internet unfortunately either provide too much (ea. joining of syllables) or isn't at all what we agree on.
If we are going with passport stuff imo just port information in the box below to the wiki. It's pretty similar to the old thread with clearer and better wording and all irrelevant points dropped.

Additional Information on the Romanisation of Chinese.

If there is no information on either Romanisation or translation listed by a reputable source, use the following method to Romanise Chinese metadata.
  1. All Chinese characters must be Romanised using Hanyu Pinyin system. All tone-marking diacritics must be omitted.
    1. 我 : Wo
    2. 三 : San
  2. For any Chinese character that, under the Hanyu Pinyin system, would be Romanised to Lü, Nü, Lüe, or Nüe; Romanise it to Lyu, Nyu, Lue, or Nue repectively instead.
    1. 女 : Nyu
    2. 略 : Lue
  3. Separate each Romanised character with a space and capitalise it. Function words are capitalised as well.
    1. 泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
    2. 兰梓 - 一百块都不给我 : Lan Zi - Yi Bai Kuai Dou Bu Gei Wo
  4. For loan words from other languages, however, Romanise all characters that make up the word into a single word in its original language.
    1. 張韶涵 - 歐若拉 : Angela Zhang - Aurora

In the RC itself
Glossary
  1. Word-by-word Romanisation: Each character must be romanised into a single, capitalised, separated word. Refer to this thread for examples and supplementary information. <<< delete this
  2. Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each Romanised character must be capitalised and separated with a space. Refer to <this wiki page> for more information.
Rules
  1. Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted with Lü, Nü, Lüe, and Nüe substituted with Lyu, Nyu, Lue, and Nue respectively.
^ use "substituted" here. ü is already Romanised but we're just replacing it with stuff because of umlauts.
Natsu
Force the same Tv Size label, since it's easier to identify the lengh of the song and if it's the opening of an anime, also everyone in osu! is used to it, removing it will just cause some confusion IMO.
Mentai
Every Cyrillic translation system is a mess, and have large inconsistencies in vowel treatment as well. I recommend using either right single quotation system, either ISO 9985 or BGN/PCGN. ISO is the most widely used for us Armenians, but having the aspirates translated the same way makes BGN/PCGN a more universal solution for all Cyrillic languages.

basically, BGN/PCGN is probably the best for a universal system for Cyrillic
Scarlet Evans
How should one romanize the title, if romanization, or even unambiguous phonetic transcription, doesn't exist, while several completely different ones are suggested to equally be such, while at the same time it would be completely, totally and absolutely wrong to choose just one of them as a title? I.e. it's not possible to transcribe in phonetic way (or romanize) the title, unless you include all of the several possible meanings? And even should one include all of them, should it be done by separating them by slash symbol, which in case of 4-5 of them would result in terribly long monstrosity king of a gore titles?

-----

Here's more of explanation:


Artists can name their songs however they want, right? They are not obliged to make it "possible" to pronounce or to have unambiguous title that someone can phonetise.

I don't know how it looks like in Chinese language, but a great deal of Japanese kanji characters can be pronounced (and romanized) not in one or two, but even several completely different ways!

I don't know, if someone made song like this, but I find nothing that could stop one from doing this, i.e. what if someone makes the song (which later someone decides to map in osu!), where:

[*] Title have no official romanization and remains "unspoken", i.e. aside of the title written in kanji, even the author never spells if and refers to the song only indirectly in words, unless by writing it or showing the kanji characters.

[*] It have several possible meanings, at least 3-4 of them, and of which ALL are suggested by the lyrics and ALL of them have COMPLETELY DIFFERENT romanization in regards to every single character used in the title. So, you can't phonetise it, as it have several, completely different phonetisations!!

[*] Lyrics of the song are used to maintain and express the ambiguity, creating a contradiction around what the title applies or refers to, making all of these "possible titles" to be "equal" on terms of being the possible title, while at the same time even contradicting each other, either by their contradictory meanings and/or by the lyrics, i.e. all of them are suggested to be the title, while at the same time being refuted from being so.

[*] It all have sense, while reading the kanji characters and extracting all these meaning, definitely making the title to fit into being the title of the song perfectly (!!!), with just one but: you can't possibly spell it or phonetise it, as it have several completely different pronounciations, as all of them can be equally considering as being and not-being (because of contradictory situation and being refuted by other meanings and/or lyrics) the title.

-----

So, each of these "possible titles" can be and can't (!) be the title at the same time, but all of them written down with the very same kanji characters are definitely the title, which deep meanings that are expressed in the lyrics.

-----

How should metadata for such song look like? :o

I don't know, if such song with "unspoken" title, where the title can have multiple meanings, currently exists, and if so, then to what extreme this ambiguity is brought, but there are titles with more than one meaning and sometimes you can't really find anywhere how the title should be pronounced or phonetised...

Also, looking at this discussion, even if such song doesn't exist, then depending on the answer I could decide to spend my time to improve my poor Japanese much better and eventually create one in future, maybe with some voluntary help of someone, just for the sake of trying to rank it in future :P

-----

But seriously... it's not required from an artist to have a title that is possible to be phonetised or romanized... what should a mapper do in such situation? Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?

And please, don't tell me that it's "impossible case" that's not worth considering, as I believe that the people in this community could definitely help to make it possible, if it will be required to.
Wafu

Scarlet Evans wrote:

Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?
Every song needs to be Romanised so that it can be searched normally. If it's your song, give it a name, if it's not, that probably can't be universally covered by RC. These situations (as they virtually don't happen) would probably be treated case by case.
Fycho
Adding additional Information seems redundant lol, if mappers/BN are unsure, they can ask Metadata QATs/Helpers for help like Modified Hepburn for Japanese.

CrystilonZ wrote:

泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
it has a mistake, should be:

没有名字的怪物 => Mei You Ming Zi De Guai Wu
神的随波逐流 => Shen De Sui Bo Zhu Liu
should delete "Word-by-word Romanisation" in Glossary.
Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted.

Considering there are only 5 ranked maps(really a few) that using "ü" until now, and neither "v" nor "yu" isn't the best choice unless we allow ü in romanised field. There will be a metadata discretion by the Metadata Team on specific cases.
show more
Please sign in to reply.

New reply