[Proposal] Metadata section overhaul

CrystilonZ

Joined January 2013

CrystilonZ 2018-03-28T19:48:58+00:00

Hi
I highly suggest everyone here read both of these https://osu.ppy.sh/forum/p/6557643 https://osu.ppy.sh/forum/p/6557415
I'll elaborate and add some points here.

Fycho wrote:
yu:
English speakers would read it "yoo", which has different tone from "ü" My description of the vowel ü is like this: firstly shape your mouth like you're going to pronounce the word "you," but instead of a vowel that sounds like oo (in moo) do a ee sound (like in bee or he) instead. Do not change your mouth's shape

asked around a bit and Nyu is pronounced like nee-yooh (like a very exaggerated new). The first portion gets the sound right and the second portion get the mouth shape right. Just sharing information lol this is no relation with stuff down there.

My opinion about this is that Romanised texts should be kind of familiar-ish to any english speaking person and give a rough hint of pronouncing the actual sound. However, the pronunciation needs not be perfectly accurate all the time and pronunciation does vary a little bit depending on the speaker's mothertongue. Here are some examples:

河 ------> Hé (pinyin) ------> He (osu!)
The "He" here is not pronounced like English he (with an e sound like be or see). It's pronounced like the e in words like "her" or French "le".

筆 ------> bǐ (pinyin) ------> bi (osu!)
This is actually pronounced like a hybrid of bee and pee. B in the pinyin system is an unaspirated p (spit)

If you have zero knowledge about Chinese of course you're probably going to get a bunch of pronunciation messed up unfortunately(I did too lmfao ask fycho).
But that does not mean we should give people something they can't even pronounce. It's unsettling and will probably be really weird to many people, which should not be a feature of any Romanised text.

also this is mentioned in Fycho's post but it seems like it's skipped over by most.

The current method of substituting ü is based on the system used in Chinese passports to Romanise people's names. This method focuses on the pronunciation because customs needs to read people's name. (also Ü can't be printed for some reason).

Any questions can be directed at me or Fycho.

Linguists and native speakers don't be mad at me pls I know the pronunciation is not exact either and I suck at linguistic stuff sorry

Mentai

472 posts

Joined June 2016

Mentai 2018-03-28T20:14:49+00:00

Fycho wrote:
If saying "v" couldn't be readed by foreigners and makes misconception, then we probably need to rework the Japanese rule as ra / ri / ru / re / ro are actually pronounced as la / li / lu / le / lo in Japanese, which is kinda unfriendly towards those latin scripts users who don't know Japanese. English speakers will pronounce "ra" differently from how it's supposed to be pronounced in Japanese.

this is technically wrong, Japanese uses a pretty happy marriage of both "r" and l"" consonants, using the full mouth formation for "r", but also pressing the tongue very slightly on the roof of the mouth, making a soft 'l". since the full formation "r" is used, it more so correlates with English "r" than "l", and Japanese people (varying on dialect, of course) will recognize this, even through the imperfection of those specific characters by westerners.

regardless, that cannot be said about "v" in Chinese. i have almost no background in Chinese, but i can at least assure using a consonant sound that has nowhere near the same mouth formation/articulation of the sound it is actually trying to produce does not work well. there has to be an unfortunate compromise between perceived transliteration, and vocal articulation.

i don't know what the solution would be, as again, i don't have any Chinese background, but going with the options that are based on actual vowels would work better than "v" in general

Monstrata

Elite Mapper: Aspirant

4,642 posts

Joined May 2013

Monstrata 2018-03-28T20:38:54+00:00

Regraz wrote:
Naotoshi wrote:
because literally 0 western readers (who the romanization is aimed at) will read v as a vowel...................................................................................

I think it is still better than misleading players to pronunce wrong.

v also misleads players to pronounce the vowel wrong. Actually it misleads them a lot more than "yu" would for non-native speakers.

Lv = Lü for Chinese speakers

Lv = Uhlv for English speakers (think, Revolve) / or Luhvuh (think, Olive

I would use phonetic transcription but then only people who can read IPA can comment so I went with pronunciation and example instead.

LwL

363 posts

Joined November 2013

LwL 2018-03-28T20:44:17+00:00

How about using "yu" for the romanized song title but mandating to put the version with "v" in the tags? That way no one would struggle to search songs if they're searching with roman input (since both ways work), non-chinese speakers would have a pronouncable title, and reading it might look slightly weird to chinese speakers but still clear as to what is meant (from what I gathered from this thread, I don't speak a single word of chinese)?

Shad0w1and

1,588 posts

Joined April 2011

Shad0w1and 2018-03-28T21:15:25+00:00

LwL wrote:
How about using "yu" for the romanized song title but mandating to put the version with "v" in the tags? That way no one would struggle to search songs if they're searching with roman input (since both ways work), non-chinese speakers would have a pronouncable title, and reading it might look slightly weird to chinese speakers but still clear as to what is meant (from what I gathered from this thread, I don't speak a single word of chinese)?

even that's the case, the thing should be put on the actual title should still be the lv not lyu. face the problem, even after the romanization, without knowing the language, almost no English speaker can pronounce name from other languages. Romanization is not a way to help English speakers pronounce it but help them learn it. And why do you expect to make an osu standard (which no one will accept it outside of osu) to mislead Chinese learners to find the song?????
Lets say you find a ranked song title "nv ren (woman) hua (flower)", you will try to search on google "nv ren hua", ok you got the result. Then you wanna search for "nyu ren hua", nope, you will find nothing. In this sense, I will accept Nu and Lu for Nv and Lv, because you can still search for them and get the result, but definitely not Nyu lmao.
And the problem still exists, the Lu and Lv sounds totally different, if you wanna make the song title readable for and Chinese and Chinese learner, you have to go with Lv Nv, not Lu Nu, and because other than these two solutions, you can't even find the song, other options (like Nyu) should be automatically ignored.

Please, everyone in this thread think about the consequence of your proposal, most of them it not even viable just because they will fuck everyone up and no one will be able to tell which song is it. If after the meta rule change, no one can use osu meta to find songs, then why do we want that change????

Last edited by Shad0w1and 2018-03-28T21:32:07+00:00, edited 2 times in total.

Video Encoding Queue | My Maps

SupaJuke

6 posts

Joined February 2017

SupaJuke 2018-03-28T21:26:29+00:00

From a person who mainly uses English even though I have the minimum knowledge to correctly pronounce pinyin including "ü", using "v" to represent "ü" is disastrous and shall be avoid. Here is why:

Any persons who do not know Chinese will eventually mispronounce this "COMPLETELY".
Even though "yu" does also lead into mispronunciation, HOWEVER, it still at least makes, even if slightly, more sense for most non-Chinese speakers.

Also, just as Peppy has already said,

peppy wrote:
romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

...if native people are offended, they can turn off roman display.

I will not bring the fact that he clearly supports the "yu"'s side, but only what I want to point out here.

ROMANIZATION is NOT for native speakers. As a Thai, if for any reasons I'm trying to search for a Thai song, I wouldn't bother using a single Latin characters. Instead, I would rather use Thai characters. I believe that Using characters in the desired language is certainly easier than trying to use Latin characters. Henceforth, this should also apply to native speakers(in this case are Chineses) when trying to search for something WHICH IS AVAILABLE IN THEIR OWN LANGUAGES (again, Chinese for this debate).

If you were to ask me if I'm offended when a Thai song is misinterpreted due to romanization, I would answer "no" without hesitation, and I strongly hope ALL NATIVE SPEAKERS feel the same when they see their own languages being misinterpreted.

Still, if native speakers are feeling offended due to their language being mispronounced or whatsoever, they have the option to turn off romanization which has also already been mentioned by Peppy.

PS. The fact that I keep mentioning Chinese isn't because I hold my grudge against them, but due to Chinese being the main discussed topic here. Being honest, I feel like all native speakers should feel the same toward "romanization".

Last edited by SupaJuke 2018-03-28T21:28:31+00:00, edited 1 time in total.

SupaJuke — Literal Osu! Addict

Mentai

472 posts

Joined June 2016

Mentai 2018-03-28T21:32:26+00:00

SupaJuke wrote:
ROMANIZATION is NOT for native speakers.

essentially this

Shad0w1and

1,588 posts

Joined April 2011

Shad0w1and 2018-03-28T21:33:32+00:00

ROMANIZATION is NOT for English speakers to pronounce.

^^^^essentially this

you guys are saying to teach osu player wrong things about Chinese and will make no way for them to find the song by title. Realize it.

Video Encoding Queue | My Maps

Mentai

472 posts

Joined June 2016

Mentai 2018-03-28T21:36:21+00:00

Romanization (also spelled romanisation: see spelling differences), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

it actually is for Latin alphabet speaker's to pronounce, not Chinese, unfortunately;

Shad0w1and

1,588 posts

Joined April 2011

Shad0w1and 2018-03-28T21:37:38+00:00

Mentai wrote:
Romanization (also spelled romanisation: see spelling differences), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

it actually is for Latin alphabet speaker's to pronounce, not Chinese, unfortunately;

and if a player want to find the song by romanized title, ask google do they accept this proposal

Video Encoding Queue | My Maps

SupaJuke

6 posts

Joined February 2017

SupaJuke 2018-03-28T21:42:00+00:00

Shad0w1and wrote:
ROMANIZATION is NOT for English speakers to pronounce.

^^^^essentially this

you guys are saying to teach osu player wrong things about Chinese and will make them find no way to find the song. Realize it.

For this to eventually be solved, we have to discuss whether we want to write the metadata according to how it is "pronounced", or "used".

If you want it to be written according to how it is actually used by natives (Chinese), then I wouldn't argue that certainly "v" is better.

However, from my point of view, a non-native speaker, I prefer the metadata to be something readable for me. Even if it would be different from how native speakers actually use when typing/writing in roman.

SupaJuke — Literal Osu! Addict

Mentai

472 posts

Joined June 2016

Mentai 2018-03-28T21:44:36+00:00

Shad0w1and wrote:
Mentai wrote:
Romanization (also spelled romanisation: see spelling differences), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

it actually is for Latin alphabet speaker's to pronounce, not Chinese, unfortunately;
and if a player want to find the song by romanized title, ask google do they accept this proposal

this is literally the issue at hand. the point is that we are valuing actual articulation higher than transliteration. there is a solution for both. having the metadata be the phonetic articulation for Latin alphabet speakers, but also requiring to have the exact Chinese transliteration in the tags. and this would work, but we don't know what to use other than "v." if we could find a happy middle ground for pronunciation between us (that being, non-native Chinese speakers and native Chinese speakers), then the problem would be resolved both ways

please be a bit open minded in respect to how vowels are treated in Latin-based languages

Shad0w1and

1,588 posts

Joined April 2011

Shad0w1and 2018-03-28T21:46:08+00:00

SupaJuke wrote:
Shad0w1and wrote:
ROMANIZATION is NOT for English speakers to pronounce.

^^^^essentially this

you guys are saying to teach osu player wrong things about Chinese and will make them find no way to find the song. Realize it.
For this to eventually be solved, we have to discuss whether we want to write the metadata according to how it is "pronounced", or "used".

If you want it to be written according to how it is actually used by natives (Chinese), then I wouldn't argue that certainly "v" is better.

However, from my point of view, a non-native speaker, I prefer the metadata to be something readable for me. Even if it would be different from how native speakers actually use when typing/writing in roman.

idc about native language or not, I am saying you cannot just make a standard which no one outside of osu will accept. that will just fuck up everyone who wants to know about the song on google. I personally is fine with Lv write as Lu because it is searchable. but Lyu is like a joke. no one will now know which song is it.

at the end, I can't see other options than these two:
Title: Nv Ren Hua, Tags: Nu
Title: Nu Ren Hua, Tags: Nv
Other options are not viable, just because it won't help to identify the song in any database or search engine.

Video Encoding Queue | My Maps

Mentai

472 posts

Joined June 2016

Mentai 2018-03-28T21:54:33+00:00

Shad0w1and wrote:
at the end, I can't see other options than these two:
Title: Nv Ren Hua, Tags: Nu
Title: Nu Ren Hua, Tags: Nv
Other options are not viable, just because it won't help to identify the song in any database or search engine.

if this is actually viable, i think the second option would be a perfect compromise between articulation, and allowing for the full transliterated string to be found

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-28T22:29:25+00:00

Shad0w1and wrote:
idc about native language or not, I am saying you cannot just make a standard which no one outside of osu will accept. that will just fuck up everyone who wants to know about the song on google. I personally is fine with Lv write as Lu because it is searchable. but Lyu is like a joke. no one will now know which song is it.

at the end, I can't see other options than these two:
Title: Nv Ren Hua, Tags: Nu
Title: Nu Ren Hua, Tags: Nv
Other options are not viable, just because it won't help to identify the song in any database or search engine.

Actually, some songs that are not really famous, that had to be Romanised by osu! community alone, can't be found by using the Romanised name. Why don't you just copy the original title and search for that? It would actually be best if the original title was available on the website, but you can still find any song as the original title must be here.

And I'm not sure what is wrong with your Google results, because it seems to have found the songs properly using the "yu" version, at least on my end. But even then, don't forget that you still can add artist to the search. The probability that "yu" will be in all of the characters (of both artist and title) is very improbable and copying both artist and the title should find the song even if one character is missing. For example this map can be found by just typing "stefanie guang", not even her full name is required. I think you're making it look more problematic than it is.

CXu

osu! Alumni

3,558 posts

Joined March 2009

CXu 2018-03-28T23:24:43+00:00

du ju rilli vant piipell tu vrait hauv dei pronauns vørds in inglisj in anådder længguej?

I mean, probably not. We don't transcribe how we pronounce things from one Latin-Alphabet using language to English, and I'm sure people of native English would have a headache reading my "norwegianization" of an Englsh sentence. It's probably a similar feeling many Chinese people are having about this change when they already have a system they use that uses Latin characters (thus making people able to write and communicate which song they mean) properly that works fine except for the one pronounciation of v, which is already a common problem of any other language than English using the Latin Alphabet.

Wafuu~/Boats

~Mod one of these and get a cookie!~
Mod queue.
Feel free to poke me in-game if you have no idea what I'm talking about in my mod.

peppy

19,301 posts

Here since the beginning

peppy 2018-03-29T02:22:21+00:00

The amount of time being thrown at one character when there's obviously a divide in decision which is *not* going to be resolved is insane.

What's the biggest site on the internet that shows romanisation? Wikipedia? Let's use what they use and call it a day.

It doesn't matter which we choose if we're going for conformity. People will get used to it.

Let's stop this and copy the most settled upon solution elsewhere on the internet.

I will be removing any posts that aren't references to other sites using romanisation in a specific way. Let's find the most popular method and use it.

Last edited by peppy 2018-03-29T02:22:42+00:00, edited 1 time in total.

Mafumafu

Beatmap Nominator

3,367 posts

Joined August 2013

Mafumafu 2018-03-29T04:16:34+00:00

I guess most players in osu! are using keyboard to input and search so I just put link here about the most popular method via keyboard, not by writing on papers or others.

peppy wrote:
Let's find the most popular method and use it.

Okay then simple answer: v is the most popular amongst players.

Wikipedia wrote:
Since the letter "v" is unused in Mandarin pinyin, it is universally used as an alias for ü.

( https://en.wikipedia.org/wiki/Pinyin_input_method )

Also one more link:
https://eastasiastudent.net/china/manda ... yin-input/

peppy

19,301 posts

Here since the beginning

peppy 2018-03-29T04:18:19+00:00

I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.

Nyquill

osu! Alumni

1,781 posts

Joined February 2011

Nyquill 2018-03-29T08:08:21+00:00

If we can go back to something close to a romanization standard for "reading" from the library of congress, they choose to use the double dotted u, which is reasonable. It'll be kind of like placing accents on french words.

https://www.loc.gov/catdir/cpso/romanization/chinese.pdf

The most readable would be this. Maybe we can do something in the search engine side to automatically let both be searched for with u and v. This is a quality of life improvement for non-chinese speakers. You would otherwise be forced to copy and paste to search.

...or we can accept romanization doesn't work perfectly for many languages including chinese, and just use the normal/most popular/most sane latin standard (v) like we would for many other languages.

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-29T08:54:41+00:00

What Nyquill and CXu post do make sense here, romanization doesn't work perfectly for many language, the double dotted u is even kept until now. In the old latin systems "v" and "u" are interchangable. I believe "v" is the most appropriate and popular choice. In youtube search system, "Lv" is obviously better keyword than "Lyu"(try search "Lv Guang"). Using the English name of citzen in passport is one-sided (passport are mostly customs, while our romansation are for players who use computer everyday)

Let's try to fix the proposal again:

Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.

The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.

Last edited by Fycho 2018-03-29T08:56:35+00:00, edited 1 time in total.

Happy life lies in peaceful mind

CrystilonZ

1,557 posts

Joined January 2013

CrystilonZ 2018-03-29T09:02:14+00:00

Wikipedia uses ü is most cases because it has no limitation about umlauts.
The only place where Romanised names are displayed and have the same umlaut limitation is the passport which is basically like this

Fycho wrote:
Lü, Nü, Lüe, and Nüe must be substituted with Lyu, Nyu, Lue, and Nue respectively.

https://en.wikipedia.org/wiki/Lü_(surname) << on wikipedia about Lü, a common surname. Read under the Romanization section.
(in Chinese) Standards set in 2012 about Romanization of Chinese names in passports.
吕女绿 (Lü Nü Lü) should be Romanised as LYU NYU LYU

Ephemeral

Inland Empire

3,936 posts

Joined April 2009

Ephemeral 2018-03-29T12:02:53+00:00

if that passport romanisation is actually the case, then that's enough precedent set for the use of "yu" over "v", it would seem

"v" is not and will not ever be a valid transliteration in english for this particular because its sound is not really approaching "vuh" or "vee". "yu" is closer to the actual overall sound ("yuU") - romanising by IME precedent is ugly in a number of cases even if it is technically easier to search for

Last edited by Ephemeral 2018-03-29T12:06:19+00:00, edited 1 time in total.

Shiguma

476 posts

Joined May 2014

Shiguma 2018-03-29T16:23:13+00:00

peppy wrote:
I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.

Except this whole debate is because there is no definite way people romanize ü. If there was, this debate would have been over centuries ago. You talk about display on websites, but every website will display it as ü.

The best solution for this would to be to allow ü in the romanization field. Why aren't we allowed to use accented characters in that field in the first place? There are more scenarios besides the Chinese ü where this becomes a problem. (Example: https://osu.ppy.sh/beatmapsets/740535#osu/1562308 Only reason this needed to be romanized is because of the umlaut a, as the artist's name is Mäe, but really we should have just had Mäe without a need for romanization)

peppy wrote:
It doesn't matter which we choose if we're going for conformity. People will get used to it.

Let's stop this and copy the most settled upon solution elsewhere on the internet.

Please make it easy for us to use accented characters in the metadata then. The solution is there, but it requires the staff's help, honestly.

Ephemeral wrote:
if that passport romanisation is actually the case, then that's enough precedent set for the use of "yu" over "v", it would seem

"v" is not and will not ever be a valid transliteration in english for this particular because its sound is not really approaching "vuh" or "vee". "yu" is closer to the actual overall sound ("yuU") - romanising by IME precedent is ugly in a number of cases even if it is technically easier to search for

You bring up passports, but that is the only scenario where "yu" is used. Chinese passports have used "v" as well, and it is up to the passport holder if they want to keep it as-is or change it to "yu" The only reason this is a thing is because they can't use ü on a passport, but really the easiest solution would be to update their system to allow ü on passports.

Seriously, the best solution is to just allow ü and other accented characters in the romanization field. Basically all websites use ü, there is no reason for us to be stuck in this debate when typing ü on a computer is so easy.

Okoratu

Nomination Assessment Team / Elite Nominator

4,913 posts

Joined May 2012

Topic Starter

Okoratu 2018-03-29T18:18:42+00:00

cleaning up open points in progress

open talking points

Chinese:

i'm not willing to link a 4 year old thread on the ranking criteria. either it should be ported to the wiki or that sentence dropped or its information condensed down into more guidelines on the topic
Hanyu Pinyin system needs an external reference if available, someone please provide me a link

Cyrillic:

Cyricllic is now undefined - anyone fancy coming up with some definition if the current thing only works for russian?

Korean:

Input from Koreans as to which standard is used and should be used going forward is needed

Thai

Input from natives required as to which standard is to be used

Arabic

do we currently need this? I have a hard time telling how much monstrata trolls in this post

TV Size label

FORCE the same way or FORCE DROPPING the label?

will update with issues i have as i go through the thread updating open points

Monstrata wrote:
With respects to Korean romanization, I'm wondering if we should continue applying the McCune-Reischauer system for romanizing Korean. This is the system that the Library of Congress is using. Nyquill brought up an excellent point about using romanization systems that other large institutions are currently using and it works a lot better than creating our own modified system in most cases (unless we are simplifying).

I'm bringing this up because there is also the Revised Romanization of Hangeul system that was introduced on July 7th, 2000 which has been applied to various Korean road signs transportations etc... The major change of course being that the new system eliminates diacritics in favor of digraphs.

A possible rule would look like:

Songs with Korean metadata must be romanised using the McCune-Reischauer system for romanizing Korean when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally, we could introduce the use of digraphs and two-vowel letters into the proposal:

Vowels /ʌ/ and ㅡ/ɯ/ should be written as digraphs in Korean romanization, and romanized to eo and eu respectively.

Another language to examine is Thai. The Library of Congress recommends nine additional rules for Thai romanization which are:

Library of Congress wrote:
Romanization
1. Tonal marks are not romanized.
2. The symbol ฯ indicates omission and is shown in romanization by “ … ” the conventional sign for
ellipsis.
3. When the repeat symbol ๆ is used, the syllable is repeated in romanization.
4. The symbol ฯลฯ is romanized Ia.
5. Thai consonants are sometimes purely consonantal and sometimes followed by an inherent vowel
romanized o, a, or ǭ depending on the pronunciation as determined from an authoritative
dictionary, such as the Royal Institute's latest edition (1999).
6. Silent consonants, with their accompanying vowels, if any, are not romanized.
7. When the pronunciation requires one consonant to serve a double function – at the end of
one syllable and the beginning of the next – it is romanized twice according to the
respective values.
8. The numerals are: ๐ (0), ๑ (1), ๒ (2), ๓ (3), ๔ (4), ๕ (5), ๖ (6), ๗ (7), ๘ (8), and ๙ (9).
9. In Thai, words are not written separately. In romanization, however, text is divided into words
according to the guidelines provided in Word Division below.
The two rules I am proposing are:

Songs with Thai metadata must be romanised using the Library of Congress system (also known as ISO 11940) for romanizing Thai when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

and

In the romanization of Thai, words should be romanized separately, and separated by a space. Additionally, all words should (or should not?) be uppercased.

Attached are helpful transcription keys for Thai:

Another language that is becoming more and more relevant is Arabic, and there are some issues I would like to bring forth with regards to its romanization.

Here is the table for romanization of Arabic:

As you can see, some issues come up. In the romanization of ص ص ص ص for example, (whether initial, Medial, Final, or Alone) the romanization becomes " ṣ" however, the diacritical mark is not something that can be used by osu because it is still not unicode. I would like to propose that all of these diacritical "," attached to letters be removed for the sake of simplicity and because osu currently does not support them. Therefore something like " ص◌نضوِ◌خ" should be romanized as "sandwich".

Another problem with Arabic is that it is typed in reverse, right to left. Should we also apply this to romanization? In this case "ص◌نضوِ◌خ" would actually be romanized as "hciwdnas" when read left to right as English readers are expected to do.

The rule I am proposing is:

Songs with Arabic metadata must be romanised using the Library of Congress system for romanizing Arabic when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally:

In the romanization of Arabic, words should be romanized in verse order, and the last letter should be be uppercased. For example in romanizing "◌ س◌ !" the correct romanization should be "!usO"

However, there is also the problem of Judeo-Arabic romanization which differs slightly from traditional Arabic romanization. Judeo-Arabic of course, stems from the Jewish Arabs many who live in Iraq and have adopted a slightly different script with respect to certain nouns and verbs. The most common Jewish Arabs are those from Baghdad. Anyways, I digress.

Attached are examples of Judei-Arabic romanziation:

So I would like to propose the following:

Songs with Judeo-Arabic metadata must be romanised using the Library of Congress system for romanizing Judeo-Arabic where Judeo-Arabic nouns and verbs are being used, and where there is no romanisation or translation information listed by a reputable source. Where Judeo-Arabic words and phrases are not used, traditional Arabic romanization will apply. The same applies to the Source field if a romanised Source is preferred by the mapper.

i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?

-----------

Applied Sieg changes to russian as they were.

draft updated, refer to open talking points

░█████╗░██╗░░██╗ ██████╗░██╗░░░██╗██████╗░███████╗
██╔══██╗██║░██╔╝ ██╔══██╗██║░░░██║██╔══██╗██╔════╝
██║░░██║█████═╝░ ██║░░██║██║░░░██║██║░░██║█████╗░░
██║░░██║██╔═██╗░ ██║░░██║██║░░░██║██║░░██║██╔══╝░░
╚█████╔╝██║░╚██╗ ██████╔╝╚██████╔╝██████╔╝███████╗
░╚════╝░╚═╝░░╚═╝ ╚═════╝░░╚═════╝░╚═════╝░╚══════╝

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-29T19:17:44+00:00

Okoratu wrote:
i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?

1. I agree this may not be very necessary. I'd say we should only require usage of certain Romanisation method only if the language needs to be Romanised repetitively, not once per history of the game.

2. In my opinion (and this is probably only about opinion), drop it and recommend it to tags. If we use a universal label, I feel like it goes completely against what the artist has chosen. If it is removed, it's not really misrepresenting the artist's choice, it is just us not considering it as a part of the title. Not sure if this makes much sense to anyone else, that's just what I think. Edit: Another important point is that, because people like 1-2 min. songs, they will be encouraged to use the in-game song length sorting/grouping (unfortunately can't be done on website). Most people probably search for short songs just by using "tv size", which can omit a large portion of maps they could actually enjoy. (There's nothing magical about TV Size songs, the only thing is that they are ~1 min 30 sec, it's just the length that players care about.) As people would get used to it, it could actually be more beneficial. (afaik Ephemeral (sorry if it wasn't you, but I think it was) said he would rename all the songs that are any kind of TV Size, so that it's consistent, so people actually would get used to it)

Chinese: Couldn't really find a reference that would not be misleading. They are usually too much concerned about the pronunciation part and don't really tell you how to Romanise it. Plus, many of them use the originally proposed style (not syllable by syllable, but "word" by "word"). Many are paid or blocked by government too for some reason. Someone may need to search a bit more to represent the currently intended system better.

Cyrillic: Wouldn't really care that much for Cyrillic as it's improbable that there will be many songs in it. But it could be recommended to use the BGN/PCGN systems for other Cyrillic. The reason being pretty simple, it's nearly the same as the Russian one. Yes, it is intended majorly for geographic names, but it was explained in past that it was the most common way we see Cyrillic Romanised and is the most compatible with modified Hepburn, which is the most spread Romanisation in osu!. It also avoided the confusion between "ch" and "kh", between "j" and "y", between "c" and "ts" etc. Could elaborate this a bit more if needed.

Korean: I believe the currently used system is "Revised Romanisation of Korean", at least it looks pretty much like it in the maps here. It's a South Korean standard (probably the one we should follow as the probability of North Korean songs being mapped here is very low). Didn't find an official document about it, but this seems like a nice reference. It explains all the things you need for Romanisation. Except for Hanja, but again, the probability of something with Hanja being mapped is low. May need input from Koreans, I haven't seen any problems with the current system.

That's my opinion at this moment.

Last edited by Wafu 2018-03-29T22:46:03+00:00, edited 1 time in total.

CrystilonZ

1,557 posts

Joined January 2013

CrystilonZ 2018-03-29T20:45:10+00:00

Delete word-by-word lol

Sites on the Internet unfortunately either provide too much (ea. joining of syllables) or isn't at all what we agree on.
If we are going with passport stuff imo just port information in the box below to the wiki. It's pretty similar to the old thread with clearer and better wording and all irrelevant points dropped.

Additional Information on the Romanisation of Chinese.

If there is no information on either Romanisation or translation listed by a reputable source, use the following method to Romanise Chinese metadata.

All Chinese characters must be Romanised using Hanyu Pinyin system. All tone-marking diacritics must be omitted.
1. 我 : Wo
2. 三 : San
For any Chinese character that, under the Hanyu Pinyin system, would be Romanised to Lü, Nü, Lüe, or Nüe; Romanise it to Lyu, Nyu, Lue, or Nue repectively instead.
1. 女 : Nyu
2. 略 : Lue
Separate each Romanised character with a space and capitalise it. Function words are capitalised as well.
1. 泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
2. 兰梓 - 一百块都不给我 : Lan Zi - Yi Bai Kuai Dou Bu Gei Wo
For loan words from other languages, however, Romanise all characters that make up the word into a single word in its original language.
1. 張韶涵 - 歐若拉 : Angela Zhang - Aurora

In the RC itself

Glossary

Word-by-word Romanisation: Each character must be romanised into a single, capitalised, separated word. Refer to this thread for examples and supplementary information. <<< delete this
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each Romanised character must be capitalised and separated with a space. Refer to <this wiki page> for more information.

Rules

Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted with Lü, Nü, Lüe, and Nüe substituted with Lyu, Nyu, Lue, and Nue respectively.

^ use "substituted" here. ü is already Romanised but we're just replacing it with stuff because of umlauts.

Natsu

8,074 posts

Joined September 2012

Natsu 2018-03-29T20:58:07+00:00

Force the same Tv Size label, since it's easier to identify the lengh of the song and if it's the opening of an anime, also everyone in osu! is used to it, removing it will just cause some confusion IMO.

Mentai

472 posts

Joined June 2016

Mentai 2018-03-29T21:05:31+00:00

Every Cyrillic translation system is a mess, and have large inconsistencies in vowel treatment as well. I recommend using either right single quotation system, either ISO 9985 or BGN/PCGN. ISO is the most widely used for us Armenians, but having the aspirates translated the same way makes BGN/PCGN a more universal solution for all Cyrillic languages.

basically, BGN/PCGN is probably the best for a universal system for Cyrillic

Scarlet Evans

345 posts

Joined August 2014

Scarlet Evans 2018-03-29T22:17:44+00:00

How should one romanize the title, if romanization, or even unambiguous phonetic transcription, doesn't exist, while several completely different ones are suggested to equally be such, while at the same time it would be completely, totally and absolutely wrong to choose just one of them as a title? I.e. it's not possible to transcribe in phonetic way (or romanize) the title, unless you include all of the several possible meanings? And even should one include all of them, should it be done by separating them by slash symbol, which in case of 4-5 of them would result in terribly long monstrosity king of a gore titles?

-----

Here's more of explanation:

Artists can name their songs however they want, right? They are not obliged to make it "possible" to pronounce or to have unambiguous title that someone can phonetise.

I don't know how it looks like in Chinese language, but a great deal of Japanese kanji characters can be pronounced (and romanized) not in one or two, but even several completely different ways!

I don't know, if someone made song like this, but I find nothing that could stop one from doing this, i.e. what if someone makes the song (which later someone decides to map in osu!), where:

[*] Title have no official romanization and remains "unspoken", i.e. aside of the title written in kanji, even the author never spells if and refers to the song only indirectly in words, unless by writing it or showing the kanji characters.

[*] It have several possible meanings, at least 3-4 of them, and of which ALL are suggested by the lyrics and ALL of them have COMPLETELY DIFFERENT romanization in regards to every single character used in the title. So, you can't phonetise it, as it have several, completely different phonetisations!!

[*] Lyrics of the song are used to maintain and express the ambiguity, creating a contradiction around what the title applies or refers to, making all of these "possible titles" to be "equal" on terms of being the possible title, while at the same time even contradicting each other, either by their contradictory meanings and/or by the lyrics, i.e. all of them are suggested to be the title, while at the same time being refuted from being so.

[*] It all have sense, while reading the kanji characters and extracting all these meaning, definitely making the title to fit into being the title of the song perfectly (!!!), with just one but: you can't possibly spell it or phonetise it, as it have several completely different pronounciations, as all of them can be equally considering as being and not-being (because of contradictory situation and being refuted by other meanings and/or lyrics) the title.

-----

So, each of these "possible titles" can be and can't (!) be the title at the same time, but all of them written down with the very same kanji characters are definitely the title, which deep meanings that are expressed in the lyrics.

-----

How should metadata for such song look like?

I don't know, if such song with "unspoken" title, where the title can have multiple meanings, currently exists, and if so, then to what extreme this ambiguity is brought, but there are titles with more than one meaning and sometimes you can't really find anywhere how the title should be pronounced or phonetised...

Also, looking at this discussion, even if such song doesn't exist, then depending on the answer I could decide to spend my time to improve my poor Japanese much better and eventually create one in future, maybe with some voluntary help of someone, just for the sake of trying to rank it in future

-----

But seriously... it's not required from an artist to have a title that is possible to be phonetised or romanized... what should a mapper do in such situation? Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?

And please, don't tell me that it's "impossible case" that's not worth considering, as I believe that the people in this community could definitely help to make it possible, if it will be required to.

♥ Yuri ♥

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-29T22:33:14+00:00

Scarlet Evans wrote:
Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?

Every song needs to be Romanised so that it can be searched normally. If it's your song, give it a name, if it's not, that probably can't be universally covered by RC. These situations (as they virtually don't happen) would probably be treated case by case.

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-30T01:20:52+00:00

Adding additional Information seems redundant lol, if mappers/BN are unsure, they can ask Metadata QATs/Helpers for help like Modified Hepburn for Japanese.

CrystilonZ wrote:
泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
it has a mistake, should be:

没有名字的怪物 => Mei You Ming Zi De Guai Wu
神的随波逐流 => Shen De Sui Bo Zhu Liu

should delete "Word-by-word Romanisation" in Glossary.

Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted.

Considering there are only 5 ranked maps(really a few) that using "ü" until now, and neither "v" nor "yu" isn't the best choice unless we allow ü in romanised field. There will be a metadata discretion by the Metadata Team on specific cases.

Last edited by Fycho 2018-03-30T01:23:02+00:00, edited 3 times in total.

Happy life lies in peaceful mind

CrystilonZ

1,557 posts

Joined January 2013

CrystilonZ 2018-03-30T03:06:09+00:00

Agree tbh half way through writing that I realised this is redundant af XDD
Wikipedia kinda provides too much but eeh whatever I guess
and somehow I accidentally copied the wrong title lmfao good job me

Last edited by CrystilonZ 2018-03-30T17:22:34+00:00, edited 1 time in total.

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-30T08:46:05+00:00

@Fycho: We can't use the link provided due to this chapter.

Single meaning: Words with a single meaning, which are usually set up of two characters (sometimes one, seldom three), are written together and not capitalized: rén (人, person); péngyou (朋友, friend); qiǎokèlì (巧克力, chocolate)
etc.

This is what our draft suggested at first and what you argued against. I think this will just make it more confusing if you are supposed to ignore some sections of it.

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-30T09:00:13+00:00

Adjusted the draft and simply removed it. I don't think there needs to be any redundant information like modified hepburn for japanese, and most Chinese metadata songs are made by Chinese speakers in the game, they know how pinyin works. Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painfully. Current rule has adequate explanation already.

Happy life lies in peaceful mind

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-30T09:57:58+00:00

Fycho wrote:
Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painly.

Yeah, I agree with that. Following the wiki would even make you Romanise it differently than what is intended by the rule.

As there was, finally, an agreement on ü, is it still in discussion or why is it missing in the draft?

For the Japanese Romanisation, I think that giving link to Hepburn wiki overall is not the best idea as multiple Hepburn systems are mixed here. What about just linking the Modified Hepburn document? I think this is the best reference as it only says the rules of the Romanisation without redundancy + it's from Library of Congress, which is probably the most official reference we can have.

Suggestion to make the Romanisation rules a bit shorter and cleaner

First of all, for Russian, change "must be romanised using..." to "must use" as it is with other languages.

All the Romanisation systems, consistently, say "when there is no romanisation or translation information listed by a reputable source" and "The same applies to the Source field if a romanised Source is preferred by the mapper."

Assuming there will still be Korean (it should be because we include it as a language selection on the website), it would be messy to have this in every rule. What about making these rules and removing them from the rules of specific languages:

1. official artist's Romanisations for all languages have priority
2. if you choose (and as of now, you can choose) to use Romanised source, use the same romanisation method

There's no reason to specify the same thing for every language if it works uniformly.

Okoratu

Nomination Assessment Team / Elite Nominator

4,913 posts

Joined May 2012

Topic Starter

Okoratu 2018-03-31T12:46:32+00:00

Someone should give me the agreement on ü then

- will change the link
- will change russian wording,
- will avoid redundancy by just taking the parts that are redundant out and making them general rules for language specific romanisation works, right?

sth like

Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.

░█████╗░██╗░░██╗ ██████╗░██╗░░░██╗██████╗░███████╗
██╔══██╗██║░██╔╝ ██╔══██╗██║░░░██║██╔══██╗██╔════╝
██║░░██║█████═╝░ ██║░░██║██║░░░██║██║░░██║█████╗░░
██║░░██║██╔═██╗░ ██║░░██║██║░░░██║██║░░██║██╔══╝░░
╚█████╔╝██║░╚██╗ ██████╔╝╚██████╔╝██████╔╝███████╗
░╚════╝░╚═╝░░╚═╝ ╚═════╝░░╚═════╝░╚═════╝░╚══════╝

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-31T15:49:14+00:00

Okoratu wrote:
Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.

Seems about right, so now the confirmation about ü, and Korean Romanisation.

Maybe a good reference for Korean (should be the Revised Romanization of Korean, which seems to be used in most maps) might be this.

CrystilonZ

1,557 posts

Joined January 2013

CrystilonZ 2018-04-01T20:21:28+00:00

ü is the matter of preference now lol everyone probably has enough information to choose a side but I don't expect a 100 percent consensus
My suggestion about that is

Official translation and/or Romanisation must be used if able. This applies to all fields that can hold romanised data by intent. If there are multiple official translations and/or Romanisations, the mapper is free to choose any of them with the only exception being when there is a previously ranked mapset of that song. In such case the corresponding guideline applies to it. << feel like all of these are related and should be under one single rule. easier to read imo.
~~If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria.~~ << redundant. The only point this can conflict with is the naming convention stuff which is a specific case. adding stuff like "if necessary, ignore the preferred naming conventions of the artists" under that point is better.
If a song or artist are referred to in multiple ways on official sources provided by the artist, the mapper is free to choose any of the romanisations. The only exception to this is if the song already has a mapset in the Ranked Section, in which case the corresponding guideline applies to it. << redundant also doesn't include translation orz

one more point that I wish to bring up is
If the artist field contains artist names with internally conflicting naming conventions (first name - last name and last name - first name formats), they must be normalized to just use the same format throughout.
^ should specify which one is preferred for consistency.
also naming conventions (stuff) --> the stuff inside parentheses should be in the glossary lol
lastly please add this to the glossary

Diacritical tone marks: ˉ, ˊ, ˇ, and ˋ above vowels in the pinyin system.

Okoratu

Nomination Assessment Team / Elite Nominator

4,913 posts

Joined May 2012

Topic Starter

Okoratu 2018-04-01T21:20:26+00:00

the answer to your specification question is neither
i'll apply the rest tomorrow if i can find what the fukc this means

░█████╗░██╗░░██╗ ██████╗░██╗░░░██╗██████╗░███████╗
██╔══██╗██║░██╔╝ ██╔══██╗██║░░░██║██╔══██╗██╔════╝
██║░░██║█████═╝░ ██║░░██║██║░░░██║██║░░██║█████╗░░
██║░░██║██╔═██╗░ ██║░░██║██║░░░██║██║░░██║██╔══╝░░
╚█████╔╝██║░╚██╗ ██████╔╝╚██████╔╝██████╔╝███████╗
░╚════╝░╚═╝░░╚═╝ ╚═════╝░░╚═════╝░╚═════╝░╚══════╝

Sign In To Proceed

Don't have an account?

[Proposal] Metadata section overhaul

Fycho wrote:

Fycho wrote:

Regraz wrote:

Naotoshi wrote:

LwL wrote:

peppy wrote:

SupaJuke wrote:

Mentai wrote:

Shad0w1and wrote:

Shad0w1and wrote:

Mentai wrote:

SupaJuke wrote:

Shad0w1and wrote:

Shad0w1and wrote:

Shad0w1and wrote:

peppy wrote:

Wikipedia wrote:

Fycho wrote:

peppy wrote:

peppy wrote:

Ephemeral wrote:

open talking points

Monstrata wrote:

Library of Congress wrote:

draft updated, refer to open talking points

Okoratu wrote:

Scarlet Evans wrote:

CrystilonZ wrote:

Fycho wrote:

Okoratu wrote:

New reply