[Proposal] Metadata section overhaul

abraker

Global Moderator

Joined July 2014

abraker 2018-03-28T01:51:20+00:00

Sieg wrote:
abraker wrote:
Any thoughts about mapping style or patterns the maps have being in tags?
I don't see any restrictions for this right now as long as they are related to the set. Also don't think that this worth specific mentioning.

The reason I mention this is that some maps in 8k mania tend to be either 8k or 7k+1 and it's impossible to know until you download and check them out. Putting down "SV" or "stream" or etc would also allow the added benefit to search for types of maps in any gamemode. I feel like something should be mentioned in guidelines.

std skin 2021: link | mania skin 2021: (vanilla ver ~ hidden ver)
osu!Skills - Compare your skills in a slightly different way
OT!neus - osu off-topic subforum's very own discord server

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-28T02:28:01+00:00

For the TV Size thing, drop some opinions:

For example this song: https://osu.ppy.sh/s/477045
The song has a game ver that without "~TV animation ver.~", and has a TV ver later that labled with "~TV animation ver.~" to distinguish. They are different in Instrument and lyrics. In this case, (TVsize) aren't necessary but not for "~TV animation ver.~". That popular "~Anime Ban~" is pretty similar stuff.

I believe there is a metadata discretion when handling things like this.

Last edited by Fycho 2018-03-28T02:30:04+00:00, edited 1 time in total.

Happy life lies in peaceful mind

F D Flourite

811 posts

Joined March 2013

F D Flourite 2018-03-28T02:32:21+00:00

Maybe I'm not complete to read the whole thread because there are much about ignoring things. I just want to say some intuitive thoughts about the language Chinese.

First of all, still many Chinese type in English to search title of Chinese songs in osu! for the sake of consistency. Personally, I'm used to type pinyin to search for song title because osu! in the past had poor support on unromanised searching (maybe it's because many maps from 2012 and earlier only have their romanised one, both for Chinese and Japanese songs, as metadata at that time was not much forced).

melloe wrote:
Thirdly, to address the problems of grouping together romanized Chinese syllables into words. It is true that in grouping together syllables there is a lot of ambiguity, but much of that ambiguity should be able to be cleared in context. For instance, taking this charming example provided to us:
Hollow Wings wrote:
"Gu Niang, Shui Jiao Yi Wan Duo Shao Qian?"
this sentence mainly has two meanings:
1. "Hey gril, how much it costs if i buy a bowl of your dumplings?" （姑娘，水饺一碗多少钱？）
2. "Hey gril, how much it costs if i sleep you one night?" （姑娘，睡觉一晚多少钱？）
Context should be able to very easily clear up such ambiguities. What is the song about? What is the rest of the song saying? Context will provide an almost effortless resolution to such conclusions, which I imagine would comprise the vast majority of such instances.

However, some of those ambiguities will be purposely rendered in the form of puns etc., such as here:
Fycho wrote:
For example, specific examples like "他谁都打不过", it's used intentionally to represent two meanings that are "Nobody can beat him" and "He can beat everybody", "Ta / Shui / Dou Da Bu Guo" and "Ta / Shui Dou Da Bu Guo".
These will most likely make up such a negligible percentage of these instances of ambiguity that to go through with the proposed changes and deal with these intentionally ambiguous titles as they come up would not be completely remiss -- but I personally believe that even these hypothetical cases, however rare, should be considered before pushing any changes. That is just my opinion, ultimately it's not up to me.

In fact for many contemporary Chinese ballads, their titles are deliberately came up as such (in the form of puns). As for the first example given here, the song title can still be sexually suggestive even if its formal title is about dumplings. Because Chinese lyrics are not as logical as daily language,
and people just can easily get the ambiguous meaning because there is no way to distinguish their pronunciation difference in a song-wise tone without logical context. "Context will provide an almost effortless resolution to such conclusions" as you said is often not the truth. Joint of Chinese characters into a single word will often cause loss of meaning in this way. (I have more examples, one of which is my uploaded map)

Fourthly, about "v" vs "u." To Chinese speakers of course "v" makes the most sense, as that is the input they use in their everyday lives, but to the western audience, "v" will make absolutely no sense. "u" and "yu" are both inadequate romanizations of "ü," because "yu" will be pronounced "yoo" by most westerners, but "v" will be next to useless for everybody except for Chinese players. "v" is more ambitious in that it serves to correctly represent a specific sound instead of simply approximating it, but for western osu players it is completely counterproductive.

I'm not sure if you go through the HW's post thoroughly but there was an example given to prove that the change from "v" to "u" will result a worse case under certain conditions: “绿光” & “露光” will be both romanised in "Lu Guang" while their actual pronunciation are completely different. For non-Chinese speakers, I don't think it can be a better way either to pronounce it or to remember the title by any means. Ofc I understand that "v" has no connection with the actual pronunciation of "ü". I was also confused when I first used a keyboard to type Chinese. However, this is just a general knowledge for all Chinese users and Chinese learners. That's how we Chinese grow up. So even we may understand that "v" can be senseless in pronunciation manner,
I don't get why non-Chinese speakers have the advantage to ignore such knowledge (which is common to us) at all. When you want to memorize a title in a different language, accepting its small piece of rule/regularity (actually it's really small) is not demanding is it? In fact for the pronunciation of Japanese romanised way of "ra" (similarly, ri, ru, re, ro), the actual pronunciation is far from /ra/, but somehow similar to be in the middle of /ra/ and /la/. Personally I'd even say it sounds much closer to /la/ in general. But when you have to memorize it, you simply accept its setting of being forced "ra". That's the same thing.

Lastly, Chinese is generally referred to as logographic rather then ideographic, as a character represents a morphheme rather than a more nebulous concept, and as ideogram usually refers specifically to a symbol that is independent of any corresponding sound--although of course no logographic writing system is without a phonetic component built into it. The terms themselves are rather fuzzy anyways, so to achieve anything of actual accuracy one has to resort to such ungainly terms as HW's "ideophonographical." However, to call Chinese logographic is not incorrect. In fact, most people, even linguists, do it.

I don't know how you call Chinese logographic so steadily so I just want TRUE evidence. And I don't even want to read Wafu's post again because he was simply doing this once and once again without compelling support. Anyways, the most intuitive thoughts of the language Chinese is still ideographically, based on how we accept Chinese education for more than 12 years. Many words that combined by two or more characters are also generated by the joint of meanings of those characters together. For example, “未来”(future) can be split as “未”(not happening) and “来”(come). And the easy joint would be "has not come yet", which is the close meaning of "future". And the word “银行”(bank) can be split as “银”(silver, which is the general currency in ancient China) and “行” (an organization/commercial firm focusing on specific fields, pronounced as Hang). And it's obvious that the joint of those two meanings an organization/commercial firm focusing on money, which is bank.

The third example would be my own map https://osu.ppy.sh/s/598869 “花儿纳吉” (The actual correct pronunciation should be Hua Er Na Zei, which is different from normal Mandarin pronunciation Hua Er Na Ji). This title has no direct meaning from Mandarin as it's from minority Chinese language (Qiang language). The official meaning is "Being happy like a flower". However, the song title still has its similar meaning to the combination of Mandarin in Chinese culture , which was also part of intention by the song author: “花儿” is flower, “纳” is containing/accepting, “吉” is happiness. If being wrongly considered as logographic, the song title would be less valuable, which is what we cannot accept. There are just thousands of more examples so I have to stop here.

As a result, I completely don't understand why you guys keep trying to call Chinese logographic by any means. It's highly COUNTER-INTUITIVE. And in fact the change of combining characters is highly impractical (as you wanted to state below the opposite way) in this way, because simply consider each character as pronunciation (as logographic indicates) will result in MEANING LOSS and CULTURE LOSS, which is definitely a wrong way to approach to Chinese language.

To the crux of the issue.

The real dichotomy here is between practicality and officiality/aesthetics. That is a highly subjective discussion and is conducive to many (as seen here) tetchy discussions. Grouping words together will almost certainly make it more convenient for non-Chinese speakers, there should really be no question about this. I personally don't even pay attention to the name of a Chinese map if it's over three or four characters long; the profusion of capitals and spacing, to my English-speaking mind, is simply inconvenient, and I would rather memorize the mapper's name, the artist's name, and the background instead. Japanese titles, meanwhile, are multisyllabic, and I would rather have a few multisyllabic words than six monosyllabic words. How closely we adhere to "ISO 7098" really should not be a question. We're a small international circle-clicking community, not an official international organization, so shouldn't we rather consider things from a functional, practical perspective?

Sorry but I just think the way of changing is even more impractical for the reason stated above

Context after here is not holding new idea so I delete them in my post. But anyways, I'm completely not convinced how changes on Mandarin/Chinese metadata would help it be more practical. On the contrary, they're ignoring the general case of Chinese and making things even worse.

We are creating maps, not producing maps.

Shad0w1and

1,588 posts

Joined April 2011

Shad0w1and 2018-03-28T03:06:19+00:00

CrystilonZ wrote:
Shad0w1and wrote:
So let's face the reality, there isn't a standard for Chinese romanization into ANSI code. I can't understand that without a commonly accepted standard, why would you guys try to change the current metadata rule?
We've expressed (thoroughly I believe) what problems the current system has. Please read all the previous points made in this discussion.

no, you don't understand, using nonsense metadata will fuck up all Chinese players and all Chinese learner players. Wtf are you considering making a nonsense international standard and let all the players think wtf is the osu meta?

and if you don't understand why almost all Chinese opposed to this proposal, I am telling you because it will fuck up almost everyone who actually know mandarin. We don't want that happen. No matter its the lv problem or ISO noun problem, they make no sense to all Chinese learners and Chinese players, this will make everyone struggle to search for songs. And you cannot just make an osu news post saying we changed the Chinese meta because we redefined an international standard !!!

also, romanize is not for english speakers to be able to read words. Lmao each semester my professors (in the US) are struggling to read students names from Germany, Ireland, India, Russia and tons of other countries. Even you romanized their names from the original language, it does NOT mean you can pronounce it. And Lv, Nv is the same case as Ra, Ri, Ru, Re, Ro in Japanese and similar cases in all other languages. You can't expect people who do not know the language to pronounce it correctly. The romanized meta, in this sense, it a way for learners to deal with the song searching.

Video Encoding Queue | My Maps

F D Flourite

811 posts

Joined March 2013

F D Flourite 2018-03-28T04:27:31+00:00

And it seems like I have to open another post for Wafu

Wafu wrote:
1. First of all, you did use the ISO document as your argument, but you didn't even know that the citation about "ideophonograph" language was just confirming what CrystilonZ posted. You agree with ISO on the same thing that you disagree with CrystilonZ on. They state the same thing.

We illustrate many reasons why each Chinese character has its own meaning and such meta would be ignored when words are joint together. And you're just repeating "same thing", "it's not non-sense", "you're ignoring what I am saying". It may be useful for one time but not for many because people don't see you're supporting ideas. You are just repeating yourself.

Wafu wrote:
Yes, I agree with that point. Some Chinese characters indeed do use "pictographic and ideographic features". You even quoted me saying that. That doesn't make the language pictographic or ideographic, because even the characters with pictographic or ideographic features are logograms. That makes the language logographic. Why do you call something non-sense and then say the same thing?

Another repeat. First of all, please provide good evidence/support to "That makes the language logographic" or you're just repeating your own words. That would lose the ground where you try to stand on. Secondly, when you accept a majority part of the language has "pictographic and ideographic features", you don't accept the fact that when combining words together such feature will be lost and it can be highly detrimental to the language meta. I just don't get it.

Wafu wrote:
7. If you have problem with me comparing how osu! works for 2 different Romanisation, I think there's a different problem. Stop calling me ignorant if you ignore what I've even written in that paragraph. It's also not non-sense. I literally just say how people work. How can that be non-sense? That is an observation.

The new system you are trying to bring out has huge difference than Romanisation we've had, as HW, Fycho and I illustrated lots of evidence and facts ("五环境内"，"他谁都打不过"，"花儿纳吉"）So seeing how people work in the past doesn't mean you're qualified enough to judge romanisation of Chinese correctly. In fact I don't think any automatic system could handle such romanisation correctly. In these cases native Chinese speakers still have louder voice.

Wafu wrote:
2. No, "v" doesn't work the best, it doesn't work at all because it has no linguistic basis. That doesn't mean "u" is the best, although we agreed that it generally won't make difference for a regular player, there are still many options that can be considered, but it can't be "v", and probably not "y", because that's associated with a different sound (even in other Romanisation systems we use). "u" pronounced in a certain way will result in the "ü" we are going for, it really is the core sound of it, I described this in the post 2 times already, so I guess I don't have to repeat myself.

I've already explained in my previous post that using "u" will cause terrible ambiguity. You insist on pronunciation of "v" is different from what we want, while you ignore the fact that ALL OTHER alphabets have their own pronunciation function in Chinese pinyin. The "v" is the most convenient one when you have to find a new thing corresponding to "ü". And again, it's just a language setting. That's how it's used for decades. You non-Chinese speakers should not have the advantage to ignore that. After all it's not demanding for people to remember such a small thing if they want to approach to Chinese. Pronunciation of "v" should not become the barrier of knowing any of Chinese, or they're determined to fail to learn it anyways.

Wafu wrote:
Third point, not sure why you are personally attacking me. How do you know what my education is, what my job is, what my real life is? You don't know any single thing about my personal life, so don't act like you do.

Answer:

Wafu wrote:
but remember it works vice-versa.

Wafu wrote:
Your false accusations (of us not reading stuff or not being professional) did, indeed, make me send you this message (and it is called exactly that: "Private message"). I'm not making fun of you as it was not public, you making it public doesn't mean I'm making fun of you. I wanted you to know that putting this down to "there's no research" was unfair of you, as you didn't invest your time into the research either. Was I being rude to you in the private message? Yes, as as you were when you clearly did, intentionally ridicule the proposal, except I at least could keep it private.

Last but not least, insulting others in private message doesn't mean that you were polite at all. It only means you pretend to be polite but failed and you wanted to hide the fact that you were not. So please learn to stay calm and polite consistently.
----------------------------------
Just an observation: No native Chinese speakers ever try to support such changes. You're trying to prove that it's easier to memorize and to pronounce for non-Chinese speakers, but we have tried super hard to prove that you haven't gone through Chinese and there are tons of fact that counters your idea of making the romanised result to audience easier and better. But I don't see your reaction of ever acknowledging that, which is very disappointing in a discussion. The changes literally try to change the Chinese language into something that Chinese speakers don't know, while you are completely indifferent about it. So do not say "conservative" again when you cannot find any other reasons that why all Chinese speakers disagree. After all, it still sounds ridiculous if no Chinese agree on Chinese metadata changes anyway.

We are creating maps, not producing maps.

Monstrata

Elite Mapper: Aspirant

4,642 posts

Joined May 2013

Monstrata 2018-03-28T04:32:13+00:00

If you guys could summarize what rule you want to change/add, that would speed up the process here a lot. Something like In the case of romanizing ü, use v, not yu. Or something like that. This is just an example btw. I disagree with using "v" as romanization since English speakers will pronounce "v" differently from how it's supposed to be pronounced in Mandarin.

I would like to add a few extra rules to the proposal, taking into account other languages that so far haven't been discussed, since everyone's been caught up on the Chinese romanization debate.

With respects to Korean romanization, I'm wondering if we should continue applying the McCune-Reischauer system for romanizing Korean. This is the system that the Library of Congress is using. Nyquill brought up an excellent point about using romanization systems that other large institutions are currently using and it works a lot better than creating our own modified system in most cases (unless we are simplifying).

I'm bringing this up because there is also the Revised Romanization of Hangeul system that was introduced on July 7th, 2000 which has been applied to various Korean road signs transportations etc... The major change of course being that the new system eliminates diacritics in favor of digraphs.

A possible rule would look like:

Songs with Korean metadata must be romanised using the McCune-Reischauer system for romanizing Korean when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally, we could introduce the use of digraphs and two-vowel letters into the proposal:

Vowels /ʌ/ and ㅡ/ɯ/ should be written as digraphs in Korean romanization, and romanized to eo and eu respectively.

Another language to examine is Thai. The Library of Congress recommends nine additional rules for Thai romanization which are:

Library of Congress wrote:
Romanization
1. Tonal marks are not romanized.
2. The symbol ฯ indicates omission and is shown in romanization by “ … ” the conventional sign for
ellipsis.
3. When the repeat symbol ๆ is used, the syllable is repeated in romanization.
4. The symbol ฯลฯ is romanized Ia.
5. Thai consonants are sometimes purely consonantal and sometimes followed by an inherent vowel
romanized o, a, or ǭ depending on the pronunciation as determined from an authoritative
dictionary, such as the Royal Institute's latest edition (1999).
6. Silent consonants, with their accompanying vowels, if any, are not romanized.
7. When the pronunciation requires one consonant to serve a double function – at the end of
one syllable and the beginning of the next – it is romanized twice according to the
respective values.
8. The numerals are: ๐ (0), ๑ (1), ๒ (2), ๓ (3), ๔ (4), ๕ (5), ๖ (6), ๗ (7), ๘ (8), and ๙ (9).
9. In Thai, words are not written separately. In romanization, however, text is divided into words
according to the guidelines provided in Word Division below.

My question for Thai romanization is whether we should treat them similarly to how we are treating Chinese romanization which is to separate words with spaces, or if we should clump them together, for example: พระนางเจาพระบรมราชินีนาถ romanized as: Phranāng Čhao Phrabǭrommarāchinī Nāt or Phranāngchaophrabǭrommarāchinīnāt. Also, should we make all the separated words uppercase, or only the first? Since Thai chains everything together, there is no indicator for upper and lower case when we split the phrase up (if we do).

The two rules I am proposing are:

Songs with Thai metadata must be romanised using the Library of Congress system (also known as ISO 11940) for romanizing Thai when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

and

In the romanization of Thai, words should be romanized separately, and separated by a space. Additionally, all words should (or should not?) be uppercased.

Attached are helpful transcription keys for Thai:

Another language that is becoming more and more relevant is Arabic, and there are some issues I would like to bring forth with regards to its romanization.

Here is the table for romanization of Arabic:

As you can see, some issues come up. In the romanization of ص ص ص ص for example, (whether initial, Medial, Final, or Alone) the romanization becomes " ṣ" however, the diacritical mark is not something that can be used by osu because it is still not unicode. I would like to propose that all of these diacritical "," attached to letters be removed for the sake of simplicity and because osu currently does not support them. Therefore something like " ص◌نضوِ◌خ" should be romanized as "sandwich".

Another problem with Arabic is that it is typed in reverse, right to left. Should we also apply this to romanization? In this case "ص◌نضوِ◌خ" would actually be romanized as "hciwdnas" when read left to right as English readers are expected to do.

The rule I am proposing is:

Songs with Arabic metadata must be romanised using the Library of Congress system for romanizing Arabic when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally:

In the romanization of Arabic, words should be romanized in verse order, and the last letter should be be uppercased. For example in romanizing "◌ س◌ !" the correct romanization should be "!usO"

However, there is also the problem of Judeo-Arabic romanization which differs slightly from traditional Arabic romanization. Judeo-Arabic of course, stems from the Jewish Arabs many who live in Iraq and have adopted a slightly different script with respect to certain nouns and verbs. The most common Jewish Arabs are those from Baghdad. Anyways, I digress.

Attached are examples of Judei-Arabic romanziation:

So I would like to propose the following:

Songs with Judeo-Arabic metadata must be romanised using the Library of Congress system for romanizing Judeo-Arabic where Judeo-Arabic nouns and verbs are being used, and where there is no romanisation or translation information listed by a reputable source. Where Judeo-Arabic words and phrases are not used, traditional Arabic romanization will apply. The same applies to the Source field if a romanised Source is preferred by the mapper.

Yet another language I would like to cover is the Cherokee language, also known as the Tsalagi Gawonihisdi, which is an Iroquoian language of the native CHerokee people to which there are approximately 300,000 tribal members. In terms of syllabary, I again lean to the ALA-LC Romanization table that was prepared by the Library of Congress attached here:

Because the language does not use capitalization, I am wondering if a rule should be made to force lower case on all songs, titles, artist, and sources with Cherokee origins. Below I propose the two following rules:

Songs with Cherokee metadata must be romanised using the syllabary provided by the ALA-LC Library of Congress system for romanizing Cherokee when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

As well as:

Songs with Cherokee metadata must use lower case across Title, Romanized Title, Artist, Romanized Artist, and Source.

Lastly, I would like to bring attention to another pictographic language. In fact, Chinese is not the only pictographic language left in the world. I'm sorry, Hollow Wings, Chinese is special, but it is not that special. You guys have forgotten about the ancient Egyptian Hieroglyphics. Such a shame.

Below is a chart on monoliteral hieroglyphs and their hieratic equivalents as researched by R. Lepsius in his book Denkmäler aus Aegypten und Aethiopien Abth. II

As you can see, not all hieroglyphs have been translated yet, and some are still in the process of being discovered due to many pyramids and ancient Egyptian pyramids currently being lost to time. Therefore I have a few set of rules to propose:

Songs with ancient Egyptian Hieroglyphic metadata must be romanised using the current knowledge of Egyptian Hieroglyphs for romanizing Egyptian Hieroglyphics when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally, because not all Hieroglyphs have been transcribed yet:

Songs with ancient Egyptian Hieroglyphic metadata that use a hieroglyph that is currently not transcribed should be replaced with "?" until a proper transcription is decided on. The same applies to the Source field if a romanised Source is preferred by the mapper.

Lastly,

Songs with ancient Egyptian Hieroglyphic metadata that use a hieroglyph that is has not yet been documented should be sent to the American Research Center in Egypt for proper processing. This rule mainly applies to mappers who are currently on archaeological digs in Egypt and find a new pyramid and want to map the songs that were uncovered.

I hope this will be of use to you guys, and I hope to see some more fruitful and productive discussion come about.

Nao Tomori

Nomination Assessment Team

3,089 posts

Joined December 2014

Nao Tomori 2018-03-28T04:47:00+00:00

from what i can tell, there are two main issues... first: splitting syllables 1 by 1 and second, using v for ü.

for the first one, syllable by syllable makes more sense. that is evident through the fact that creating words out of specific syllables will block other readings. it's true that it might be easier to memorize if artificial words are created but those would necessarily be arbitrary since those divisions between syllables (to create words) are not used in the actual language. at that point the romanization would be inaccurate...

second: the v shit
the point of romanization is to create a "word" in latin script that can be read by westerners. what chinese people do while texting unfortunately is not important to this discussion
v would not be pronounced as anything resembling ü by any westerner. using something that would definitely be pronounced completely wrong doesn't suit the purpose of romanization. using something like "ue" or "yu" which is (as i understand) how ü is supposed to be said makes much more sense.

Hollow Wings

1,514 posts

Joined July 2010

Hollow Wings 2018-03-28T04:54:52+00:00

i'll turn simple now.

------

CrystilonZ wrote:
To be honest I'm very pissed off right now and you have no idea how hard it is for me to post in this calm manner.

You, as Chinese have no priority in this matter, just because it's about Chinese. I believe all people here are civilised people and civilised people argue with reason.Read more about this here <Argument from authority>

i'm already pissed off since idea of "Wafu: 'You, as Chinese have no priority in this matter, just because it's about Chinese,' " came up.

HOW THE HELL YOU CAN SAY OR AGREE WITH THAT?

if any other person out of Chinese has higher priority than Chinese people ever be defined, that'll be a huge humiliating to us.

ANYONE GONNA NEVER GOT ANY HIGHER PRIORITY THAN CHINESE PEOPLE ABOUT CHINESE MATTERS.

you are saying tell me figure out what nonsense means to you, then you gonna figure out what that point of view means to us.
and i'll keep using the word "nonsense" if you keep that point of view like that.
and i'll keep calling your posts based on that points of view "nonsense" if you keep regard Chinese people like "You, as Chinese have no priority in this matter, just because it's about Chinese, ".

wanna "vice-versa"? here it is.

------------------

look at the current topic's direction, full of useless discussion about concepts or grammars, not ever contribute to it any longer.

CrystilonZ or wafu never got the most important point, even they think they know it like "i agree with you that Chinese shall be understood with context".
↑ hell no.

we Chinese ever get involved to this topic already explain things about this lots of time, but you never solve what may really trouble in any answer to that point.

the ambiguous words separating troubling situation won't just happen during romanisation, because:
Chinese characters may be written in representing various meanings on purpose.
that will effect us to saperating words.

AND I'M TIRED OF SHOWING EXAMPLES.
but i'll still give one more if it can eventually let you understand.

↓ EVRYONE NOTICE THIS EXAMPLE PLEASE BECAUSE I THINK THIS IS IMPORTANT. PLEASE. ↓

song tilte: "爱在上" → "Ai Zai Shang"
1. "爱在上" → "Ai Zai Shang" → "love is above (sth.)"
2. "爱在上" → "Ai Zaishang" → love is paramount.

context:
lyric example: 天苍苍爱在上抬头就仰望
meaning 1: love is above the place higher than sky which is already very high.
meaning 2: love is paramount that sky can't compare to it.

"在上" is an adjective word in Chinese which means "the top/superme/etc."
"在上" is a short sentence which can be analysed like "A在B上“ and means "A is above B"

both of those meanings are corrct answer to the original Chinese character "爱在上".
and also, both of meanings are necessary to be represented because that's what that lyric's purpose: to show how love is so great that in both physical metaphoring way and mental way.

↑ EVRYONE NOTICE THIS EXAMPLE PLEASE BECAUSE I THINK THIS IS IMPORTANT. PLEASE. ↑

now answer me to the point: how to build a transcription that can express both of those meanings?
saperated romanized words won't do it because none of them express the original Chinese characters or words or sentences correct.

you gonna build no more better transcription system if you can't solve problem like this, which is widely happened in Chinese separating work.

this is far more away from the problem if Western people or non-Chinese speakers could understand/remember/search/etc. Chinese things better or not, it's about only romanisation of Chinese itself.

and no one had answer this, but just arguing with endless useless things.'

------------------

i'll do no more examples because i've already showed enough.

the best choice of Chinese romanisation system to osu community will always be doing it in one-by-one-character method, which cause least trouble in expressing Chinese characters' meanings.

and that's what ISO did world level widely.

and that's why i wanna communite to international people with standards that had been proven or identified, to cause less drama like this.

some people just don't get it and wanan make things forward, which is really out of osu community range and already be proven that is not a better choice already.

if you gonna use some GB/T like "The Basic Rules of the Chinese Phonetic Alphabet Orthography", then i'll be pleasure to talking about other several GB/T that identified by PRC which all about romanisation.
but it will still be a bad choice to me that we can even rule this community by some single contry's standard, but not the international one which already exisist and identified.

if staffs insist creating new rules aside any of them, then so be it.
i'm not staff so i can't do anything, maybe just sign and wait for troubles occur.

------------------

if nonsense continues, then fine, the problem in this post may never be solved and there's no solid evidence to change the current transcript system for Chinese characters.

i'll ignore all of those arguments or nonsense with concepts or whatever else without standards that can reach international level from now on.

Sorry but i decide to give simple mods to mappers, since the last bn test.
Because i think mappers would easily understand what i'm saying even i don't explain.
If one mapper can't do that, well, he may just wanna deny me rather than my mods.
Of course, i don't think i'm worse than any members in bn team within modding ability area.
to players: If an Extra diff can be dtfc, we mappers usually call that diff "Insane",
no matter how much pp it provides.

Noffy

Nomination Assessment Team

1,656 posts

Joined April 2012

Noffy 2018-03-28T05:05:10+00:00

bless naotoshi for summarizing what others took essays to say into two easy to read paragraphs
this isn't even sarcastic
bless you naotoshi

Monstrata wrote:
Songs with Korean metadata must be romanised using the McCune-Reischauer system for romanizing Korean when there is no romanisation or translation information listed by a reputable source.

May I ask why you chose the McCune Reischauer system in particular? Some quick research reveals it is currently only used officially in North Korea, with "Revised Romanization of Korean" being the current official system used for South Korea. As uhhh.... 99% of korean songs , especially those mapped in osu, come from South Korea, it would make far more sense to use that instead, overall. This may need more input from those who are more familiar with Korean.

Monstrata wrote:
Another language to examine is Thai. The Library of Congress recommends nine additional rules for Thai romanization which are:

could you please cite said documents that you mention throughout your post by linking them in case anyone wants to review it for themselves.

Monstrata wrote:
Another problem with Arabic is that it is typed in reverse, right to left. Should we also apply this to romanization?

No, roman characters are written from left to right. It would obscure the meaning and reading to write them in reverse order. Japanese can also be written from right to left, albeit in vertical lines instead of horizontal, do we romanise it backwards? no.

Overall, besides maybe, yeah we should definitely address korean as kpop is pretty popular on osu, the rest of this seems like needless bloat to the ranking criteria due to how infrequently songs in Thai, Arabic, or Cherokee, or, in Hieroglyphs, would be mapped. These should continue to be handled case-by-case and use common sense like they currently are.

Also, in general guys: while this discussion about mandarin and chinese is definitely important, please be sure to not neglect reviewing the draft itself and pointing out any other areas it could be improved. I believe both sides for the Chinese debate have at this point said anything that needs to be said to represent their viewpoint, which at this point leaves fixing out things like romanisation of ü or deciding if anything else should be considered when revising the current draft based upon these discussions.

Edit:
Additionally, please consider making tl;dr versions of your posts, this thread is nearly impossible for most people to read in its current state due to its sheer scale. It's gotten a bit out of hand.

Last edited by Noffy 2018-03-28T05:33:06+00:00, edited 1 time in total.

I may be nowhere close to sparkling stars in the sky, but I at least can light a small candle ablaze in my soul!

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-28T05:14:00+00:00

If saying "v" couldn't be readed by foreigners and makes misconception, then we probably need to rework the Japanese rule as ra / ri / ru / re / ro are actually pronounced as la / li / lu / le / lo in Japanese, which is kinda unfriendly towards those latin scripts users who don't know Japanese. English speakers will pronounce "ra" differently from how it's supposed to be pronounced in Japanese.

As all of us known, Modified Hepburn(Japan gov uses Kunrei) and Pinyin(China gov uses Pinyin) system are the international standard systems, people who learn Chinese will start as pinyin, and when they start learning input lately, they will know "v", and "v" is the most familiar and well-known letter for Chinese speakers and leaners. "u" messes up with the vowel "u", and "yu" would be pronounced as "yoo" or other wrong pronunciation by most English speakers. Both are not inadequate for representing "ü". If anyone has better choice than "v", feel free to advise rather than suggest useless stuffs. Otherwise we will keep the "v" for "ü".

I'll give a summary for the discussions later. (already did at https://osu.ppy.sh/community/forums/top ... rt=6554329)

Happy life lies in peaceful mind

Monstrata

Elite Mapper: Aspirant

4,642 posts

Joined May 2013

Monstrata 2018-03-28T05:23:14+00:00

Fycho wrote:
If saying "v" couldn't be readed by foreigners and makes misconception, then we probably need to rework the Japanese rule as ra / ri / ru / re / ro are actually pronounced as la / li / lu / le / lo in Japanese, which is kinda unfriendly towards those latin scripts users who don't know Japanese. English speakers will pronounce "ra" differently from how it's supposed to be pronounced in Japanese.

As all of us known, Modified Hepburn(Japan gov uses Kunrei) and Pinyin(China gov uses Pinyin) system are the international standard systems, people who learn Chinese will start as pinyin, and when they start learning input lately, they will know "v", and "v" is the most familiar and well-known letter for Chinese speakers and leaners. "u" messes up with the vowel "u", and "yu" would be pronounced as "yoo" or other wrong pronunciation by most English speakers. Both are not inadequate for representing "ü". If anyone has better choice than "v", feel free to advise. Otherwise we would keep the "v" for "ü".

It's not the same. R and L are pronounced almost the same way across most phonetics. V and u are way different since V is a consonant.

Ask yourself, how would you pronounce ü using english phonetics. The answer should not be "v" because that's a voiced labiodental fricative. Not a vowel.

F D Flourite

811 posts

Joined March 2013

F D Flourite 2018-03-28T05:33:21+00:00

Monstrata wrote:
It's not the same. R and L are pronounced almost the same way across most phonetics. V and u are way different since V is a consonant.

Ask yourself, how would you pronounce ü using english phonetics. The answer should not be "v" because that's a voiced labiodental fricative. Not a vowel.

Even so, it's also the fact that few westerners pronounce it ever correctly in tournament commentary or any other similar situations. If "l" is indicates the actual pronunciation much better, "it should be changed to reflect the pronunciation correctly to be friendly to non-Japanese speakers" (as how pronunciation theory goes without taking care about any other factors)

So I'm still unsure about how much pronunciation should matter here.

--------------------------------------------------------------------------------------------------------------------------------------------

Proposal wrote:
If the artist or title field exceeds the uploadable maximum length, or both together cause Windows filenames for the .osu files to exceed 255 characters, any additional markers from the fields causing this have to be dropped consistently and if this is still not sufficient, the corresponding fields need to be abbreviated reasonably and end in ... to signal that this song title has been shortened.

I'd love to see this one got added! But maybe we can ask mapper to provide the whole song name in a specific area? For example, creator's words or so. Otherwise we just cannot tell what song it is exactly

Last edited by F D Flourite 2018-03-28T05:37:36+00:00, edited 1 time in total.

We are creating maps, not producing maps.

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-28T05:34:31+00:00

The discussion of "Romanization of Chinese" should stop until we figure out from these posts. You are free to discuss after that.

Now for the sake of not getting the thread flood and return to a normal discussion about other stuffs, no shitposts, no non-helper posts, no flame or personal insult, action will be taken if necessary. Behave yourselves, otherwise you shoulder the responsibility.

Happy life lies in peaceful mind

melloe

268 posts

Joined February 2013

melloe 2018-03-28T06:46:50+00:00

quote="F D Flourite" Maybe I'm not complete to read the whole thread because there are much about ignoring things. I just want to say some intuitive thoughts about the language Chinese.

First of all, still many Chinese type in English to search title of Chinese songs in osu! for the sake of consistency. Personally, I'm used to type pinyin to search for song title because osu! in the past had poor support on unromanised searching (maybe it's because many maps from 2012 and earlier only have their romanised one, both for Chinese and Japanese songs, as metadata at that time was not much forced).

Before I say anything, please don't read what I have to say already thinking about how to refute it, or argue against it, or prove me wrong. Some things I actually agree with you on so please try to actually consider what I'm saying.

melloe wrote:
Thirdly, to address the problems of grouping together romanized Chinese syllables into words. It is true that in grouping together syllables there is a lot of ambiguity, but much of that ambiguity should be able to be cleared in context. For instance, taking this charming example provided to us:
Hollow Wings wrote:
"Gu Niang, Shui Jiao Yi Wan Duo Shao Qian?"
this sentence mainly has two meanings:
1. "Hey gril, how much it costs if i buy a bowl of your dumplings?" （姑娘，水饺一碗多少钱？）
2. "Hey gril, how much it costs if i sleep you one night?" （姑娘，睡觉一晚多少钱？）
Context should be able to very easily clear up such ambiguities. What is the song about? What is the rest of the song saying? Context will provide an almost effortless resolution to such conclusions, which I imagine would comprise the vast majority of such instances.

However, some of those ambiguities will be purposely rendered in the form of puns etc., such as here:
Fycho wrote:
For example, specific examples like "他谁都打不过", it's used intentionally to represent two meanings that are "Nobody can beat him" and "He can beat everybody", "Ta / Shui / Dou Da Bu Guo" and "Ta / Shui Dou Da Bu Guo".
These will most likely make up such a negligible percentage of these instances of ambiguity that to go through with the proposed changes and deal with these intentionally ambiguous titles as they come up would not be completely remiss -- but I personally believe that even these hypothetical cases, however rare, should be considered before pushing any changes. That is just my opinion, ultimately it's not up to me.

In fact for many contemporary Chinese ballads, their titles are deliberately came up as such (in the form of puns). As for the first example given here, the song title can still be sexually suggestive even if its formal title is about dumplings. Because Chinese lyrics are not as logical as daily language,
and people just can easily get the ambiguous meaning because there is no way to distinguish their pronunciation difference in a song-wise tone without logical context. "Context will provide an almost effortless resolution to such conclusions" as you said is often not the truth. Joint of Chinese characters into a single word will often cause loss of meaning in this way. (I have more examples, one of which is my uploaded map)

I think you must misunderstand me, because I am pretty much in agreement with you on this. As for the first example, if the song content itself is not sexual in nature then naturally the song title should be not sexually suggestive. However, I have already said that puns are intentional ambiguities that cannot be resolved by any combination of word groupings, and that that is one reason to not go through with the proposed changes, which is actually what you said--we are in agreement on this. Let me clarify, I am not flat out taking a single stance with every paragraph I type, I am just offering my own opinion on different matters regarding this topic, sometimes in favor of the proposal, sometimes against.

Fourthly, about "v" vs "u." To Chinese speakers of course "v" makes the most sense, as that is the input they use in their everyday lives, but to the western audience, "v" will make absolutely no sense. "u" and "yu" are both inadequate romanizations of "ü," because "yu" will be pronounced "yoo" by most westerners, but "v" will be next to useless for everybody except for Chinese players. "v" is more ambitious in that it serves to correctly represent a specific sound instead of simply approximating it, but for western osu players it is completely counterproductive.

I'm not sure if you go through the HW's post thoroughly but there was an example given to prove that the change from "v" to "u" will result a worse case under certain conditions: “绿光” & “露光” will be both romanised in "Lu Guang" while their actual pronunciation are completely different. For non-Chinese speakers, I don't think it can be a better way either to pronounce it or to remember the title by any means. Ofc I understand that "v" has no connection with the actual pronunciation of "ü". I was also confused when I first used a keyboard to type Chinese. However, this is just a general knowledge for all Chinese users and Chinese learners. That's how we Chinese grow up. So even we may understand that "v" can be senseless in pronunciation manner,
I don't get why non-Chinese speakers have the advantage to ignore such knowledge (which is common to us) at all. When you want to memorize a title in a different language, accepting its small piece of rule/regularity (actually it's really small) is not demanding is it? In fact for the pronunciation of Japanese romanised way of "ra" (similarly, ri, ru, re, ro), the actual pronunciation is far from /ra/, but somehow similar to be in the middle of /ra/ and /la/. Personally I'd even say it sounds much closer to /la/ in general. But when you have to memorize it, you simply accept its setting of being forced "ra". That's the same thing.

Again I am offering a two-sided view, so that anyone who reads my post (if they don't want to slough through the other very, very long posts) can have multiple perspectives to build their own opinion off of, perhaps I should have made that more clear. For proposal: v makes no sense to westerners. Against proposal: u is not the same as ü, so "v" is a better choice if we want to be exactly precise about that particular vowel instead of just approximating it with "u," the tradeoff being of course that westerners will be confused.
Non-Chinese will not even know to look up the usage of "v" at all, unless you go around and tell everyone, they will just accept it as a strange aspect of the Chinese language and continue pronouncing it "el vee" or "lvvv" or "liv," instead of actually pronouncing it "lü."
In Japanese the IPA notation /ɾ/ as in ra ri ru re ro in English is simply marked as "r" instead of getting its own special character--in other words, we are approximating the pronunciation. And I suspect many non-Japanese have not even memorized that fact, and if they know that "ra" is actually /ɾa/ it's only because they have heard it from anime or something. If we were to do the same for Chinese, we would be again be approximating the ü sound by labeling it "u."

Lastly, Chinese is generally referred to as logographic rather then ideographic, as a character represents a morphheme rather than a more nebulous concept, and as ideogram usually refers specifically to a symbol that is independent of any corresponding sound--although of course no logographic writing system is without a phonetic component built into it. The terms themselves are rather fuzzy anyways, so to achieve anything of actual accuracy one has to resort to such ungainly terms as HW's "ideophonographical." However, to call Chinese logographic is not incorrect. In fact, most people, even linguists, do it.

I don't know how you call Chinese logographic so steadily so I just want TRUE evidence. And I don't even want to read Wafu's post again because he was simply doing this once and once again without compelling support. Anyways, the most intuitive thoughts of the language Chinese is still ideographically, based on how we accept Chinese education for more than 12 years. Many words that combined by two or more characters are also generated by the joint of meanings of those characters together. For example, “未来”(future) can be split as “未”(not happening) and “来”(come). And the easy joint would be "has not come yet", which is the close meaning of "future". And the word “银行”(bank) can be split as “银”(silver, which is the general currency in ancient China) and “行” (an organization/commercial firm focusing on specific fields, pronounced as Hang). And it's obvious that the joint of those two meanings an organization/commercial firm focusing on money, which is bank.

Do the words ideograph and logograph mean something to you that they don't to other people? You seem quite adamant on this. I only mentioned this matter as a very unimportant one, which is why I put it last. The terms ideograph and logograph themselves are very nebulous, ill-defined, any usage of either will not be very accurate unless you append extra stuff to make something long and complicated like "ideophonographical." In my experience, an ideogram is something that represent a particular idea or concept. 上 would be an ideograph, and Chinese for 1 2 3 would be ideograph. However, 的 is not really a concrete concept, it's more of a morpheme (the smallest unit of meaning in a language), like the english -ly or -ing. Concepts, too, can be morphemes, so the label "logogram" cover both complete concepts as well as morphemes. However, this topic is ultimately really quite trivial and not all that important to the discussion at hand, so if the words "logogram" and "ideogram" have a definition to you that I'm not aware of and it's extremely importanat to you that Chinese be called "ideographic," I'm more than happy to comply.

The third example would be my own map https://osu.ppy.sh/s/598869 “花儿纳吉” (The actual correct pronunciation should be Hua Er Na Zei, which is different from normal Mandarin pronunciation Hua Er Na Ji). This title has no direct meaning from Mandarin as it's from minority Chinese language (Qiang language). The official meaning is "Being happy like a flower". However, the song title still has its similar meaning to the combination of Mandarin in Chinese culture , which was also part of intention by the song author: “花儿” is flower, “纳” is containing/accepting, “吉” is happiness. If being wrongly considered as logographic, the song title would be less valuable, which is what we cannot accept. There are just thousands of more examples so I have to stop here.

As a result, I completely don't understand why you guys keep trying to call Chinese logographic by any means. It's highly COUNTER-INTUITIVE. And in fact the change of combining characters is highly impractical (as you wanted to state below the opposite way) in this way, because simply consider each character as pronunciation (as logographic indicates) will result in MEANING LOSS and CULTURE LOSS, which is definitely a wrong way to approach to Chinese language.

Again, with how adamant you are on this topic of logo/ideograph--does ideographic and logographic have a meaning I'm not aware of? I don't have any agenda in calling Chinese logographic, that's just what it is according to the definition of ideo/logographic that I have. Maybe you have a different or more correct definition, in which case feel free to teach me.

If we are to talk about practicality, then you have to consider perspectives from both sides. Impractical to whom? You have to remember that many non-Chinese don't speak Chinese and don't know anything about Chinese meaning or culture to begin with, so joining words will not result in any meaning or culture loss for them. The only difference for them is that it will be easier to remember. Of course it will be impractical to Chinese speakers, but for English or other non-Chinese speakers it will be easier to remember song titles (I have talked about this earlier, and included relevant quote), this is what I mean by practical.

To the crux of the issue.

The real dichotomy here is between practicality and officiality/aesthetics. That is a highly subjective discussion and is conducive to many (as seen here) tetchy discussions. Grouping words together will almost certainly make it more convenient for non-Chinese speakers, there should really be no question about this. I personally don't even pay attention to the name of a Chinese map if it's over three or four characters long; the profusion of capitals and spacing, to my English-speaking mind, is simply inconvenient, and I would rather memorize the mapper's name, the artist's name, and the background instead. Japanese titles, meanwhile, are multisyllabic, and I would rather have a few multisyllabic words than six monosyllabic words. How closely we adhere to "ISO 7098" really should not be a question. We're a small international circle-clicking community, not an official international organization, so shouldn't we rather consider things from a functional, practical perspective?

Sorry but I just think the way of changing is even more impractical for the reason stated above
Yes, it will be impractical for you as a Chinese speaker, but I was talking about practicality for the many non-Chinese people who play this game. Let me bring in what you said later to Wafu and address that point:
Just an observation: You're trying to prove that it's easier to memorize and to pronounce for non-Chinese speakers, but we have tried super hard to prove that you haven't gone through Chinese and there are tons of fact that counters your idea of making the romanised result to audience easier and better. But I don't see your reaction of ever acknowledging that, which is very disappointing in a discussion.
I'm a native English speaker, and even after having learned some Chinese for some years as a child, it is still much easier for me to remember Chinese titles if they are grouped together into longer bundles that, visually speaking, more closely resemble "words" such as one might see in English. I have addressed this in the second point of my original post. (Also I tested it on some non-Chinese speaking friends without context, and here are the results: for some people, a Chinese word is easier to remember when separated by syllable, IF there are about three or less syllables overall. For longer titles, maybe four syllables or above, grouped words is easier to remember.

Context after here is not holding new idea so I delete them in my post. But anyways, I'm completely not convinced how changes on Mandarin/Chinese metadata would help it be more practical. On the contrary, they're ignoring the general case of Chinese and making things even worse.
I think you have to understand the argument from both sides--this is not just for this particular issue, but a life lesson for everywhere. Of course these changes will seem 100% silly to you, because you are Chinese, but you have to consider it from a westerner perspective: they don't know any Chinese, and for them grouped words are much easier to remember, because grouped words more closely resembles our own language.

However, again, just as you should consider everything from different perspectives, I already have considered it from a native Chinese-speaker's perspective and I have already said why I support the status quo, some of those arguments you have repeated here. Even if it doesn't change your opinion, it is good practice to truthfully consider other perspectives and be generous to them.

Also, I did not know that Chinese so heavily depended on romanization to search for Chinese songs in osu, which is why I asked about it, so definitely thank you for providing that information, I'll take your word for it. More reason to keep status quo, I guess?
/quote
can't embed more than 2 quotes within each other lol

Last edited by melloe 2018-03-28T06:49:51+00:00, edited 1 time in total.

Xinnoh

Elite Mapper

1,950 posts

Joined April 2014

Xinnoh 2018-03-28T06:49:30+00:00

Hi, I have some serious concerns regarding the romanisation of emojis, as they are not mentioned anywhere in the proposal.

What would be the most acceptable method of romanising emojis when there are no measures to keep consistency? For example, the 😀 emoji could be romanised as :-) or

, whether the nose is included becomes a subjective matter which needs to be avoided for metadata rules.

Hence I propose for consistency

When romanising emojis that contain faces, avoid using noses as they are fairly uncommonly used

In addition, how are symbols with no clear emoticon meant to be romanised? There is no clear way to express 🍆, 👺 or 💩 with basic latin unicode.
Using discord format such as :weary:

k_hand: :sweat_drops: really doesn't have the same effect

In addition, there are cases where emoticons have characters that are not accepted under traditional romanisation such as ヽ༼ຈل͜ຈ༽ﾉ, not to mention the presence of kaomoji which have no clear method of romanisation such as (⁎⚈᷀᷁ᴗ⚈᷀᷁⁎). How should these circumstances be handled when dealing with songs that use emoticons like these?

Sieg

3,981 posts

Joined February 2012

Sieg 2018-03-28T07:46:19+00:00

abraker wrote:
The reason I mention this is that some maps in 8k mania tend to be either 8k or 7k+1 and it's impossible to know until you download and check them out.

In that case it probably belongs to the osu!mania RC, anyways you should suggest wording and bring a bit of discussion on the topic if you feel that this is worth to add to the guidelines here or there.

Ephemeral

Inland Empire

3,936 posts

Joined April 2009

Ephemeral 2018-03-28T08:25:54+00:00

are we good on word-by-word romanization for Chinese and if not, can someone cite why in less than 200 words?

keep in mind that the ONLY thing that matters re: romanization is that the title itself can be readily linked back to its source material, we're not after translation or context preservation or anything like that, only transliteration

please and thank you

unicode emoji can be converted to their nearest native equivalent for transliteration's sake, if they dont have one (ie: eggplant, etc), then they dont get ported, ez

Last edited by Ephemeral 2018-03-28T08:35:43+00:00, edited 4 times in total.

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-28T10:29:08+00:00

Ephemeral wrote:
keep in mind that the ONLY thing that matters re: romanization is that the title itself can be readily linked back to its source material, we're not after translation or context preservation or anything like that, only transliteration

Well that's kinda important. Because in UBKRC, we actually considered the compatibility within osu!. There was that agreement that if it's possible, we should choose what is similar to Modified Hepburn, to keep the metadata consistent etc. (I believe it was for Cyrillic and Chinese)

If that is not the case now, the current system is indeed better, but the "ü" to "v" should be discussed.

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-28T10:51:30+00:00

I tried to summarize from the whole discussion, and the romanisation of Chinese method will be status quo (word-by-word method).
Adjust the wording of it a bit to avoid misunderstanding. Whoever disagrees the method of romanisation rule feel free to contact me in PM and I am willing to explain it to you.

Glossary:
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules:
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Below are things that we haven't reach an outcome, which still need to be public discussed:
The romanisation of "女/吕"(Nü / Lü)'s vowel "ü". Current we have two choice "v" and "yu". Here are the pros and cons.
v:

It's consistent with current ranked maps, players who can't speak Chinese wouldn't struggle to search songs.
"v" is the most common and familiar way to Chinese native speakers and leaners, and it's used in the input keyboard. People who learn Chinese will have to know "v" if they start typing pinyin.
It couldn't be read properly by English speakers as it's not a vowel in English.

yu:

It could be read easily by English speakers.
English speakers would read it "yoo", which has different tone from "ü" (I can't find an proper English word to express the correct tone of "ü").
"y" is consonant, "yu" is made from one consonant and one vowel while the original is one vowel "ü".

Feel free to drop opinions about it.

Happy life lies in peaceful mind

Monstrata

Elite Mapper: Aspirant

4,642 posts

Joined May 2013

Monstrata 2018-03-28T10:57:14+00:00

Closest pinyin sound to ü is "yi" btw. I can see why "yu" would be read as "yoo" cuz you naturally make "oo" sound with u. You naturally make "ee" sound with i. "yi" is closest imo as someone who speaks mandarin and english, what do you think?

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-28T11:31:08+00:00

Monstrata wrote:
Closest pinyin sound to ü is "yi" btw. I can see why "yu" would be read as "yoo" cuz you naturally make "oo" sound with u. You naturally make "ee" sound with i. "yi" is closest imo as someone who speaks mandarin and english, what do you think?

I can agree with that. It's not exact, but way closer than "yu" or "v", at least if you read it as in any other romanisation system we currently use.

@Fycho ""v" is the most common and familiar way to Chinese native speakers and leaners", nobody does that. Majority of people will still write "ü" and if you use the "v" on the pinyin layout, you will end up getting "ü" anyway. "v" is associated just so that you memorize where it is on the layout, not because of any relation to the character.

CXu

osu! Alumni

3,558 posts

Joined March 2009

CXu 2018-03-28T11:45:52+00:00

About the whole v, u, yu, whatever thing. It's not like people who don't know the language would pronounce things right anyway. For example, anyone who doesn't know Spanish would know that ll is not pronounced as l but /ʎ/ <- this thing, yet I doubt anyone would change how the Spanish word is written (since it's in the Latin alphabet already I guess?)

Similarly, we have æ ø å in Norwegian/Danish (and ä ö å in Swedish) which usually just get changed to ae, o/oe, aa, at least in Norwegian. The ø kinda sounds like the i in first, while å is like the o in old.

My point is kind of that even in languages that use the Latin alphabet, they have their own sounds and sometimes characters that there's no way to accurately approximate, so the priority should rather be on making it unambiguous and easy for foreigners to type and communicate the title as accurately as possible.

Last edited by CXu 2018-03-28T11:52:42+00:00, edited 1 time in total.

Wafuu~/Boats

~Mod one of these and get a cookie!~
Mod queue.
Feel free to poke me in-game if you have no idea what I'm talking about in my mod.

Kroytz

703 posts

Joined February 2013

Kroytz 2018-03-28T11:50:57+00:00

Why don’t we just allow Unicode characters to be used when submitting beatmaps and we can solve all our problems? (serious)

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-28T11:55:36+00:00

Kroytz wrote:
Why don’t we just allow Unicode characters to be used when submitting beatmaps and we can solve all our problems? (serious)

Because searching would be much more complicated. Majority of people couldn't type characters such as ǔ, ü, etc. and wouldn't find anything in the end.

CXu wrote:
Similarly, we have æ ø å in Norwegian/Danish (and ä ö å in Swedish) which usually just get changed to ae, o/oe, aa, at least in Norwegian. The ø kinda sounds like the i in first, while å is like the o in old.

Yes, but "ae", "o/oe" and "aa" seem to have a basis in pronunciation, at least. "v" is a choice based on keyboard layout. Which will not be even close if you actually try to pronounce it.

Last edited by Wafu 2018-03-28T12:03:21+00:00, edited 1 time in total.

Mafumafu

Beatmap Nominator

3,361 posts

Joined August 2013

Mafumafu 2018-03-28T11:56:04+00:00

Monstrata wrote:
Closest pinyin sound to ü is "yi" btw. I can see why "yu" would be read as "yoo" cuz you naturally make "oo" sound with u. You naturally make "ee" sound with i. "yi" is closest imo as someone who speaks mandarin and english, what do you think?

In my opinion "yi" is not feasible to be implemented since there already exists "yi" as a syllable in Pin Yin system. Using yi will produce new confusions.

I think, "yu" is not a better choice than "v" since when combined with, for instance, "l", or other consonants, the pronunciation (under the prospective of a non-Mandarin speaker) is quite deviant from what it supposes to be (in Mandarin).

Fycho wrote:
It could be read easily by English speakers.

Also, before you try to regard "easier to pronounce for non-Mandarin speakers" as an advantage of any other choice than "v", you need to make sure whether the pronunciation of the new choice is, at least, inclined to the correct pronunciation of that in Mandarin, otherwise, the "easier to speak" statement will not be a valid reason. Since "yu" is far more from being similar to ü, it is already disqualified itself for having the advantage in pronunciation.

ü in Mandarin, is quite a special one. Almost none, I mean, including v itself have priority or advantage from the aspect of pronunciation, as it is almost impossible to find an approximate incarnation, I mean, an alternative or representative of ü from the alphabet used as a reference in the Romanization process.

Another example in Pin Yin is “x”, for example “Xue”, of which the pronunciation also differs from English. So why it is not brought up in the draft, to replace x with other characters when Romanization? Many other examples could be raised up here but I guess this is enough.

One cannot expect Romanization to teach them how to speak a language, though it might help people to get some insight about the pronunciation of it. Navel-gazing on finding alternatives that serves the “pronunciation” nemesis-like task people presumptuously equipped onto the process of Romanization is pointless. Therefore, I would appeal people to discuss from other aspects – treat v, yu or no matter what characters equally as to this point of view.

When I was writing this reply, I spot another post in the thread so I would like to add something more here:

Wafu wrote:
nobody does that. Majority of people will still write "ü" and if you use the "v" on the pinyin layout, you will end up getting "ü" anyway.

Are you sure nobody does that? And are you sure you will end up getting ü? What do you mean by “pinyin” layout? Input method? Softwares? Human-machine interfaces? If so, why osu! cannot do that? Asserting by vocabularies like “nobody” is not convincing, you might need to provide evidence to support your idea.

Wafu wrote:
Majority of people couldn't type characters such as ǔ, ü, etc. and wouldn't find anything in the end.

So what do you actually mean? You posted "Majority of people will still write ü" while "Majority of people couldn't type characters such as ǔ, ü, etc."

Nevo

Global Moderator

1,176 posts

Joined November 2015

Nevo 2018-03-28T12:05:02+00:00

Fycho wrote:
For the TV Size thing, drop some opinions:

For example this song: https://osu.ppy.sh/s/477045
The song has a game ver that without "~TV animation ver.~", and has a TV ver later that labled with "~TV animation ver.~" to distinguish. They are different in Instrument and lyrics. In this case, (TVsize) aren't necessary but not for "~TV animation ver.~". That popular "~Anime Ban~" is pretty similar stuff.

I believe there is a metadata discretion when handling things like this.

Well if it's from the TV version the mapper should use "~TV animation ver.~" and then the game version it shouldn't use "~TV animation ver.~" I think I basically repeated what you said aaaaa

Kroytz

703 posts

Joined February 2013

Kroytz 2018-03-28T12:07:46+00:00

Wafu wrote:
Kroytz wrote:
Why don’t we just allow Unicode characters to be used when submitting beatmaps and we can solve all our problems? (serious)

Because searching would be much more complicated. Majority of people couldn't type characters such as ǔ, ü, etc. and wouldn't find anything in the end.

You could add the alternate romanizations in the tags so that it does pop up when searching tho. If a title had “lüe” then you’d add “lue” and “lve” into the tags and now people can search them. I dunno, seems simpler to do it that way imo. Keep original title, and add the complicated multi-spelling stuff into tags

CXu

osu! Alumni

3,558 posts

Joined March 2009

CXu 2018-03-28T12:09:51+00:00

Wafu wrote:
Kroytz wrote:
Why don’t we just allow Unicode characters to be used when submitting beatmaps and we can solve all our problems? (serious)
Because searching would be much more complicated. Majority of people couldn't type characters such as ǔ, ü, etc. and wouldn't find anything in the end.

CXu wrote:
Similarly, we have æ ø å in Norwegian/Danish (and ä ö å in Swedish) which usually just get changed to ae, o/oe, aa, at least in Norwegian. The ø kinda sounds like the i in first, while å is like the o in old.
Yes, but "ae", "o/oe" and "aa" seem to have a basis in pronunciation, at least. "v" is a choice based on keyboard layout. Which will not be even close if you actually try to pronounce it.

Aa has no relation to the pronounciation of å other than å looking similar to a. There's also still the ll in Spanish for example, which isn't pronounced like what an English speaker would think.

As for the v, I'm pretty sure in older Latin (I think) the letters u and v were interchangeable, so in that sense it's not too non-sensical if we consider it to be a language-specific thing. Many other languages have these things that are pronounced differently, that anyone who wishes to pronounce foreign titles properly, would have to do at least a minimal amount of research anyway.

Wafuu~/Boats

~Mod one of these and get a cookie!~
Mod queue.
Feel free to poke me in-game if you have no idea what I'm talking about in my mod.

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-28T12:41:55+00:00

Regraz wrote:
Wafu wrote:
nobody does that. Majority of people will still write "ü" and if you use the "v" on the pinyin layout, you will end up getting "ü" anyway.
Are you sure nobody does that? And are you sure you will end up getting ü? What do you mean by “pinyin” layout? Input method? Softwares? Human-machine interfaces? If so, why osu! cannot do that? Asserting by vocabularies like “nobody” is not convincing, you might need to provide evidence to support your idea.

Wafu wrote:
Majority of people couldn't type characters such as ǔ, ü, etc. and wouldn't find anything in the end.
So what do you actually mean? You posted "Majority of people will still write ü" while "Majority of people couldn't type characters such as ǔ, ü, etc."

Let's take stuff out of the context again, nice. You can't be serious at this point.

In my response to Fycho, we were talking about Chinese speakers and learners. Of course, this is about the keyboard layout used in pinyin input methods. Why would I use layout in relation to software or human-machine interface, when we talk about inputting characters? These people, who actually use the pinyin input method will press "v", which, on its layout, will allow you to write these Latin characters (not only ü, depends on type of input method, there is not only one) that are not available in regularly used input methods.

osu! cannot do that, because players would have to swap their keyboard input method if they wanted to search for Chinese metadata.

Stop asking for evidence for things that don't require it (you can search any text in pinyin for that). Majority of people (who use pinyin input method, not talking about regular users, in case you wanted to take this out of context again) will use the ü characters, because it's correct and they know how to do it with the pinyin input method. The proof is that most of the Chinese transcribed text available on the internet, does indeed use these characters, in fact, I haven't seen any officially transcribed texts that did use "v" rather than "ü". I have only seen either the classic pinyin method using all the special characters, or replaced with the original character.

My response to Kroytz was about osu! players, not about people who do use pinyin input method. osu! players are generally not able to type these characters because they are not using the pinyin input layout. Stop mixing two irrelevant posts together and taking them out of the context just to "prove me wrong".

@CXu: "u" didn't exist in Old/Classical Latin and "v" was pronounced as "u", indeed. But it's not very likely that people know that to not pronounce it as "v" in modern Latin script. Even if they read that as "u", it wouldn't be very close, that's why the discussion is important and why it's important to just not let it to something so minor as a keyboard layout used in pinyin input methods.

Ephemeral

Inland Empire

3,936 posts

Joined April 2009

Ephemeral 2018-03-28T13:01:35+00:00

i'm going to start handing out silences with respective length to overall wordcount in posts if people don't start focusing on discussing things that actually matter and helping to push the draft forward

out of "yu" vs "yi" (with the caveat expressed above that "yi" exists in pinyin already) vs "v" (with the caveat that using v as a vowel is a ludicrous proposition for any english speaker, native or otherwise), which would be the most "expected" use in the context of an everyday, average reader/player with no knowledge or understanding of the language?

answer in <200 words only plz

Last edited by Ephemeral 2018-03-28T13:03:55+00:00, edited 1 time in total.

Wafu

1,703 posts

Joined June 2011

Wafu 2018-03-28T13:31:30+00:00

Ephemeral wrote:
out of "yu" vs "yi" (with the caveat expressed above that "yi" exists in pinyin already) vs "v" (with the caveat that using v as a vowel is a ludicrous proposition for any english speaker, native or otherwise), which would be the most "expected" use in the context of an everyday, average reader/player with no knowledge or understanding of the language?

answer in <200 words only plz

In my opinion, for average reader/player who knows nothing about the language, it would be "yi". "v" without any knowledge will be always pronounced as you see it, "yu" would be pronounced as "yoo", "yi" would be pronounced as "yi". The only thing missing here is the accent needed to pronounce it properly, which player with no knowledge will not use intuitively anyway.

Mafumafu

Beatmap Nominator

3,361 posts

Joined August 2013

Mafumafu 2018-03-28T13:34:33+00:00

Okay keep it short. I would support v since:
1. “v” is how Romanization of Mandarin input via keyboard, no matter what speaker;
2. both “yu” and "yi" are impractical since the pronunciation of it is not related to ü as well as y is a consonant, which is conflict with the idea of ü (vowel) too. (This is to clarify that they own no advantage over v according to pronunciation or vowel-consonant thingy);
3. Outside osu!, “v” is more commonly used than “yu” and "yi". Using "yu" or "yi" will be more confusing, from an everyday and average point of view, when they find "v" in other places.
4. Use of v for ü has a larger user database. But there are hardly people using yi or sth for ü since it has been used somewhere else in the Pin Yin system.

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-28T13:41:41+00:00

For the sake of people who don't know how to pronouce "ü" correctly,

open http://hanyu.baidu.com/s?wd=%E6%B7%A4, then click the blue voice logo of the site and hear it. http://hanyu.baidu.com/s?wd=%E9%A9%B4, click blue voice logo to hear the "ü" with a consonant and tone.

Also we focus on discussion wether it should be "v" or "yu", whoever is interested in why, contact me via PM and I will tell you.

I will make another summary merging "ü" to the main proposal of romanisation of Chinese when stuff gets discussed adequately.

Last edited by Fycho 2018-03-28T14:11:42+00:00, edited 2 times in total.

Happy life lies in peaceful mind

CXu

osu! Alumni

3,558 posts

Joined March 2009

CXu 2018-03-28T13:47:38+00:00

I'd go with v.

Pronounciation in foreign languages isn't going to be mapped accurately anyway (ll in Spanish, å -> aa in Norwegian), so in my opinion it's better to make it "nonsense", making people who would like to pronounce it properly look it up, rather than pronouncing it wrongly (I wouldn't pronounce yu as ü, for instance), and it distinguishes between lü/lv and lu just fine.

Just as someone reading "llamo" would say "lamo" without prior Spanish knowledge, I think it's fine to expect some knowledge when someone want's to pronounce romanized chinese properly.

Wafuu~/Boats

~Mod one of these and get a cookie!~
Mod queue.
Feel free to poke me in-game if you have no idea what I'm talking about in my mod.

TiRa

10 posts

Joined February 2017

TiRa 2018-03-28T13:51:53+00:00

@Ephemeral

Not a mapper, but here's my take on it.

Yu and yi both already represent sounds in their own rights (鱼 and 一, for example), so there's space for confusion there. ü represents a different sound that I feel isn't properly conveyed through either yu or yi. The average player pronouncing lyu and lyi would say something like "lew" and "lee" respectively, when the actual sound is more like "lui".

So I think it's better for non-Chinese-literate players to realize that it has a completely different pronunciation by using v. They will eventually memorize the spelling, even if they don't know the word (but they will realize that it IS a different word, which might not be the case for yi or yu). Meanwhile, because v is used in place of ü when typing in pinyin, Chinese people will also have no trouble looking up maps.

melloe

268 posts

Joined February 2013

melloe 2018-03-28T13:56:37+00:00

Both have hefty pros and cons.

"v" is accurate representation of "ü" so for Chinese speakers/typers it is 100% ideal. However, it's ludicrous for anyone without prior knowledge of "v" usage, so many people, especially in the West, will be completely baffled.

"u," "yu," and "yi" are all equally inaccurate ("ü" happens to fall right between "yu" and "yi"), they are only approximations and thus not perfect. "Lu" will be "loo"; "lyu" will be "liu" or "lai yoo"; "lyi" will be god knows what. There's too much ambiguity. However, all of these are far closer to the truth than "v" for non-Chinese speakers.

None of these are ideal and make heavy compromises, so I think we should just keep the status quo "v" until we can find/implement a better solution, such as the one Kroytz suggested earlier: https://puu.sh/zRitU.png

However, out of the two, I personally prefer "u" or "yu," but I only chose "v" because tha thappens to be the status quo.

peppy

19,301 posts

Here since the beginning

peppy 2018-03-28T14:26:01+00:00

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

using "v" should not even be considered, so please do not even consider it. if native people are offended, they can turn off roman display.

i believe "yu" is the only correct answer here.

Last edited by peppy 2018-03-28T14:26:43+00:00, edited 1 time in total.

Fycho

Global Moderator

2,479 posts

Joined August 2012

Fycho 2018-03-28T14:33:53+00:00

Okay, time to make a summary again. Here is the modified draft of the proposal, it’s not final yet, feel free to suggest if I miss something.

Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Lü, Nü, Lüe, and Nüe must be substituted with Lyu, Nyu, Lue, and Nue respectively.

Lüe and Nüe are substituted with Lue and Nue instead of Lyue and Nyue because of the following reasons:
1. Lue and Nue do not exist in the pinyin system. Substituting Lüe and Nüe with Lue and Nue respectively does not cause any ambiguity.
2. The model of substituting ü in this proposal is based on what is used in Chinese passports. Chinese passports change Lüe and Nüe to Lue and Nue as well. Lyu and Lue are in the GB/T. (only applied to People name)
3. Lue and Nue are more friendly to people that do not know how pinyin works and are easier to pronounce compared to Lyue and Nyue.
4. "üe" is technically same as "ue" in terms of pronunciation.

Anyone has concerns are free to contact me, I am willing to try my best to explain stuffs.

Last edited by Fycho 2018-03-28T16:32:32+00:00, edited 2 times in total.

Happy life lies in peaceful mind

LwL

363 posts

Joined November 2013

LwL 2018-03-28T15:43:31+00:00

romanisation isn't for the people that speak the language. it is for people that can't who wish to (as accurately as possible) pronounce and process what they are reading.

If that was the case, shouldn't titles that are fully in ascii still be romanized? I didn't have the faintest clue how to pronounce "Tijdmachine" until I heard the song. This is also heavily dependent on the readers' native language. To me, "Teidmaschien" would be a very accurate transliteration since I'll pronounce things german when in doubt, to an english speaker it'd (I'm assuming) be "Tidemachine".

The "v" vs "yu" thing is similar, if you want to give me an easy pronunciation guide, just "ue" will do as that's the german non-unicode alternative to "ü" (and the sound is very similar to how it is in chinese). Both "yu" and "v" will give me the wrong idea.

It's true that "v" as a vowel is unpronouncable to most westerners, but CXu has a good point about it being equal to "u" in original old latin script, and I'd say old latin stone engravings are perfectly readable.

Last edited by LwL 2018-03-28T15:43:43+00:00, edited 1 time in total.

emilia

893 posts

Joined October 2012

emilia 2018-03-28T16:51:59+00:00

im hearing stuff about allowing players who speak english to be able to pronounce titles that arent english? i speak both mandarin and english i think im allowed to have some kind of opinion

when people who primarily speak english see the word "sage" or "mote" they'll probably pronounce it as they think in english. couple it with a few japanese words romanized and it'll sound completely different, especially to osu!players. this is mainly due to how osu! is such a largely japanese influenced game. its sort of a cultural difference at this point. those outside of osu! might even butcher the pronounciation of the other romanized japanese words as well, to no real surprise.

i see no real reason for "v" to not be used because its already so established to chinese players. its how the unique sound is pronounced and theres no two ways about it. "-yu" simply looks too unnatural, and it sounds a whole lot more butchered than what its trying to emulate. i dont see why people cant pronounce "lve shi" and "lve yourself" differently. why cant "v" be simply an osu! culture as well?

Sign In To Proceed

Don't have an account?

[Proposal] Metadata section overhaul

Sieg wrote:

abraker wrote:

melloe wrote:

Hollow Wings wrote:

Fycho wrote:

CrystilonZ wrote:

Shad0w1and wrote:

Wafu wrote:

Wafu wrote:

Wafu wrote:

Wafu wrote:

Wafu wrote:

Wafu wrote:

Wafu wrote:

Library of Congress wrote:

CrystilonZ wrote:

Monstrata wrote:

Monstrata wrote:

Monstrata wrote:

Fycho wrote:

Monstrata wrote:

Proposal wrote:

melloe wrote:

Hollow Wings wrote:

Fycho wrote:

abraker wrote:

Ephemeral wrote:

Monstrata wrote:

Kroytz wrote:

CXu wrote:

Monstrata wrote:

Fycho wrote:

Wafu wrote:

Wafu wrote:

Fycho wrote:

Wafu wrote:

Kroytz wrote:

Wafu wrote:

Kroytz wrote:

CXu wrote:

Regraz wrote:

Wafu wrote:

Wafu wrote:

Ephemeral wrote:

New reply