I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.
https://en.wikipedia.org/wiki/Lü_(surname) << on wikipedia about Lü, a common surname. Read under the Romanization section.Fycho wrote:
Lü, Nü, Lüe, and Nüe must be substituted with Lyu, Nyu, Lue, and Nue respectively.
Except this whole debate is because there is no definite way people romanize ü. If there was, this debate would have been over centuries ago. You talk about display on websites, but every website will display it as ü.peppy wrote:
I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.
Please make it easy for us to use accented characters in the metadata then. The solution is there, but it requires the staff's help, honestly.peppy wrote:
It doesn't matter which we choose if we're going for conformity. People will get used to it.
Let's stop this and copy the most settled upon solution elsewhere on the internet.
You bring up passports, but that is the only scenario where "yu" is used. Chinese passports have used "v" as well, and it is up to the passport holder if they want to keep it as-is or change it to "yu" The only reason this is a thing is because they can't use ü on a passport, but really the easiest solution would be to update their system to allow ü on passports.Ephemeral wrote:
if that passport romanisation is actually the case, then that's enough precedent set for the use of "yu" over "v", it would seem
"v" is not and will not ever be a valid transliteration in english for this particular because its sound is not really approaching "vuh" or "vee". "yu" is closer to the actual overall sound ("yuU") - romanising by IME precedent is ugly in a number of cases even if it is technically easier to search for
i'm very unsure about these things, can someone with knowledge of the language please clear this up?Monstrata wrote:
With respects to Korean romanization, I'm wondering if we should continue applying the McCune-Reischauer system for romanizing Korean. This is the system that the Library of Congress is using. Nyquill brought up an excellent point about using romanization systems that other large institutions are currently using and it works a lot better than creating our own modified system in most cases (unless we are simplifying).
I'm bringing this up because there is also the Revised Romanization of Hangeul system that was introduced on July 7th, 2000 which has been applied to various Korean road signs transportations etc... The major change of course being that the new system eliminates diacritics in favor of digraphs.
A possible rule would look like:Songs with Korean metadata must be romanised using the McCune-Reischauer system for romanizing Korean when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.
Additionally, we could introduce the use of digraphs and two-vowel letters into the proposal:Vowels /ʌ/ and ㅡ/ɯ/ should be written as digraphs in Korean romanization, and romanized to eo and eu respectively.
Another language to examine is Thai. The Library of Congress recommends nine additional rules for Thai romanization which are:The two rules I am proposing are:Library of Congress wrote:
Romanization
1. Tonal marks are not romanized.
2. The symbol ฯ indicates omission and is shown in romanization by “ … ” the conventional sign for
ellipsis.
3. When the repeat symbol ๆ is used, the syllable is repeated in romanization.
4. The symbol ฯลฯ is romanized Ia.
5. Thai consonants are sometimes purely consonantal and sometimes followed by an inherent vowel
romanized o, a, or ǭ depending on the pronunciation as determined from an authoritative
dictionary, such as the Royal Institute's latest edition (1999).
6. Silent consonants, with their accompanying vowels, if any, are not romanized.
7. When the pronunciation requires one consonant to serve a double function – at the end of
one syllable and the beginning of the next – it is romanized twice according to the
respective values.
8. The numerals are: ๐ (0), ๑ (1), ๒ (2), ๓ (3), ๔ (4), ๕ (5), ๖ (6), ๗ (7), ๘ (8), and ๙ (9).
9. In Thai, words are not written separately. In romanization, however, text is divided into words
according to the guidelines provided in Word Division below.Songs with Thai metadata must be romanised using the Library of Congress system (also known as ISO 11940) for romanizing Thai when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.
andIn the romanization of Thai, words should be romanized separately, and separated by a space. Additionally, all words should (or should not?) be uppercased.
Attached are helpful transcription keys for Thai:
Another language that is becoming more and more relevant is Arabic, and there are some issues I would like to bring forth with regards to its romanization.
Here is the table for romanization of Arabic:
As you can see, some issues come up. In the romanization of ص ص ص ص for example, (whether initial, Medial, Final, or Alone) the romanization becomes " ṣ" however, the diacritical mark is not something that can be used by osu because it is still not unicode. I would like to propose that all of these diacritical "," attached to letters be removed for the sake of simplicity and because osu currently does not support them. Therefore something like " ص◌نضوِ◌خ" should be romanized as "sandwich".
Another problem with Arabic is that it is typed in reverse, right to left. Should we also apply this to romanization? In this case "ص◌نضوِ◌خ" would actually be romanized as "hciwdnas" when read left to right as English readers are expected to do.
The rule I am proposing is:Songs with Arabic metadata must be romanised using the Library of Congress system for romanizing Arabic when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.
Additionally:In the romanization of Arabic, words should be romanized in verse order, and the last letter should be be uppercased. For example in romanizing "◌ س◌ !" the correct romanization should be "!usO"
However, there is also the problem of Judeo-Arabic romanization which differs slightly from traditional Arabic romanization. Judeo-Arabic of course, stems from the Jewish Arabs many who live in Iraq and have adopted a slightly different script with respect to certain nouns and verbs. The most common Jewish Arabs are those from Baghdad. Anyways, I digress.
Attached are examples of Judei-Arabic romanziation:
So I would like to propose the following:Songs with Judeo-Arabic metadata must be romanised using the Library of Congress system for romanizing Judeo-Arabic where Judeo-Arabic nouns and verbs are being used, and where there is no romanisation or translation information listed by a reputable source. Where Judeo-Arabic words and phrases are not used, traditional Arabic romanization will apply. The same applies to the Source field if a romanised Source is preferred by the mapper.
1. I agree this may not be very necessary. I'd say we should only require usage of certain Romanisation method only if the language needs to be Romanised repetitively, not once per history of the game.Okoratu wrote:
i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria
-----------
TV Size labels in title: drop universally or force the same labeling on all tv size songs?
Every song needs to be Romanised so that it can be searched normally. If it's your song, give it a name, if it's not, that probably can't be universally covered by RC. These situations (as they virtually don't happen) would probably be treated case by case.Scarlet Evans wrote:
Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?
should delete "Word-by-word Romanisation" in Glossary.CrystilonZ wrote:
泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
it has a mistake, should be:
没有名字的怪物 => Mei You Ming Zi De Guai Wu
神的随波逐流 => Shen De Sui Bo Zhu Liu
Single meaning: Words with a single meaning, which are usually set up of two characters (sometimes one, seldom three), are written together and not capitalized: rén (人, person); péngyou (朋友, friend); qiǎokèlì (巧克力, chocolate)This is what our draft suggested at first and what you argued against. I think this will just make it more confusing if you are supposed to ignore some sections of it.
etc.
Yeah, I agree with that. Following the wiki would even make you Romanise it differently than what is intended by the rule.Fycho wrote:
Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painly.
Seems about right, so now the confirmation about ü, and Korean Romanisation.Okoratu wrote:
Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.
Noffy wrote:
ok time for a re-review with slightly fresher eyes
the thing part thirty wrote:
Guest mappers, storyboarders, and hitsounders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributors of any given beatmap set.
-> + "Skinners should be added if they made the skin specifically for the mapset" (in contrast to someone just borrowing/mixing skin elements that're already out there) (this would be nice)
the thing part forty two wrote:
Commas, vs., &, any variations of feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.
(CV: blah) vs. ( CV: blah ) . the latter would look silly, so CV: shouldn't require leading whitespace either. Or uhhh... this doesn't apply to sides which have the inside of a bracket next to them? or something. since it'd also apply to like, (feat.) vs. ( feat. ) which isn't.. better really.. hmmm
I'm not sure how to fix the wording for this though
aaaaaaaa~
Idea wrote:
Trailing/leading whitespace is not required if the character next to it is the inner side of a bracket.
Example: Hello (CV: Goodbye) is okay, Hello( CV: Goodbye ) is not.
I believe it should be moved to metadata. Conflicts shouldn't really happen if we're using a common sense approach, the few samples you had laid out earlier seem like they would work for most cases. It is just a much better way to handle determining cut songs vs full songs, unless the length is obvious from the listing as Noffy said.Lanturn wrote:
Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.
I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.
For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.
So uh. Pick one I guess and we'll move along with it from there.
I wish not to fuel the fire of this longstanding debate which somehow has the resolution already. Yet, there is a flaw in this rule that I cannot resist to break my silence. It is not about 'v', 'yu', and 'yi', but that this rule does not take the existence of another language that adopts Chinese characters but whose pronunciation (and romanisation) are nothing like Mandarin Chinese into account. This language is Cantonese.Fycho wrote:
Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.
The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.
(This post is a continuation of my first post on page 12; the 179th post.)Wafu wrote:
@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically.
Fycho wrote:
We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.
If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.
Okoratu wrote:
can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"