forum

[Proposal] Metadata section overhaul

posted
Total Posts
216
show more
Mafumafu
I guess most players in osu! are using keyboard to input and search so I just put link here about the most popular method via keyboard, not by writing on papers or others.

peppy wrote:

Let's find the most popular method and use it.
Okay then simple answer: v is the most popular amongst players.

Wikipedia wrote:

Since the letter "v" is unused in Mandarin pinyin, it is universally used as an alias for ü.
( https://en.wikipedia.org/wiki/Pinyin_input_method )

Also one more link:
https://eastasiastudent.net/china/manda ... yin-input/
peppy
I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.
Nyquill
If we can go back to something close to a romanization standard for "reading" from the library of congress, they choose to use the double dotted u, which is reasonable. It'll be kind of like placing accents on french words.

https://www.loc.gov/catdir/cpso/romanization/chinese.pdf

The most readable would be this. Maybe we can do something in the search engine side to automatically let both be searched for with u and v. This is a quality of life improvement for non-chinese speakers. You would otherwise be forced to copy and paste to search.

...or we can accept romanization doesn't work perfectly for many languages including chinese, and just use the normal/most popular/most sane latin standard (v) like we would for many other languages.
Fycho
What Nyquill and CXu post do make sense here, romanization doesn't work perfectly for many language, the double dotted u is even kept until now. In the old latin systems "v" and "u" are interchangable. I believe "v" is the most appropriate and popular choice. In youtube search system, "Lv" is obviously better keyword than "Lyu"(try search "Lv Guang"). Using the English name of citzen in passport is one-sided (passport are mostly customs, while our romansation are for players who use computer everyday)

Let's try to fix the proposal again:


Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.

The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.
CrystilonZ
Wikipedia uses ü is most cases because it has no limitation about umlauts.
The only place where Romanised names are displayed and have the same umlaut limitation is the passport which is basically like this

Fycho wrote:

Lü, Nü, Lüe, and Nüe must be substituted with Lyu, Nyu, Lue, and Nue respectively.
https://en.wikipedia.org/wiki/Lü_(surname) << on wikipedia about Lü, a common surname. Read under the Romanization section.
(in Chinese) Standards set in 2012 about Romanization of Chinese names in passports.
吕女绿 (Lü Nü Lü) should be Romanised as LYU NYU LYU
Ephemeral
if that passport romanisation is actually the case, then that's enough precedent set for the use of "yu" over "v", it would seem

"v" is not and will not ever be a valid transliteration in english for this particular because its sound is not really approaching "vuh" or "vee". "yu" is closer to the actual overall sound ("yuU") - romanising by IME precedent is ugly in a number of cases even if it is technically easier to search for
Shiguma

peppy wrote:

I'm not talking about typing. You can search using the actual non-romanised charactres. Please read my last post again – I'm talking about display on websites.
Except this whole debate is because there is no definite way people romanize ü. If there was, this debate would have been over centuries ago. You talk about display on websites, but every website will display it as ü.

The best solution for this would to be to allow ü in the romanization field. Why aren't we allowed to use accented characters in that field in the first place? There are more scenarios besides the Chinese ü where this becomes a problem. (Example: https://osu.ppy.sh/beatmapsets/740535#osu/1562308 Only reason this needed to be romanized is because of the umlaut a, as the artist's name is Mäe, but really we should have just had Mäe without a need for romanization)

peppy wrote:

It doesn't matter which we choose if we're going for conformity. People will get used to it.

Let's stop this and copy the most settled upon solution elsewhere on the internet.
Please make it easy for us to use accented characters in the metadata then. The solution is there, but it requires the staff's help, honestly.

Ephemeral wrote:

if that passport romanisation is actually the case, then that's enough precedent set for the use of "yu" over "v", it would seem



"v" is not and will not ever be a valid transliteration in english for this particular because its sound is not really approaching "vuh" or "vee". "yu" is closer to the actual overall sound ("yuU") - romanising by IME precedent is ugly in a number of cases even if it is technically easier to search for
You bring up passports, but that is the only scenario where "yu" is used. Chinese passports have used "v" as well, and it is up to the passport holder if they want to keep it as-is or change it to "yu" The only reason this is a thing is because they can't use ü on a passport, but really the easiest solution would be to update their system to allow ü on passports.

Seriously, the best solution is to just allow ü and other accented characters in the romanization field. Basically all websites use ü, there is no reason for us to be stuck in this debate when typing ü on a computer is so easy.
Topic Starter
Okoratu
:D cleaning up open points in progress

open talking points


  • Chinese:
  1. i'm not willing to link a 4 year old thread on the ranking criteria. either it should be ported to the wiki or that sentence dropped or its information condensed down into more guidelines on the topic
  2. Hanyu Pinyin system needs an external reference if available, someone please provide me a link
Cyrillic:
  1. Cyricllic is now undefined - anyone fancy coming up with some definition if the current thing only works for russian?
Korean:
  1. Input from Koreans as to which standard is used and should be used going forward is needed
Thai
  1. Input from natives required as to which standard is to be used
Arabic
  1. do we currently need this? I have a hard time telling how much monstrata trolls in this post
TV Size label
  1. FORCE the same way or FORCE DROPPING the label?
will update with issues i have as i go through the thread updating open points

Monstrata wrote:

With respects to Korean romanization, I'm wondering if we should continue applying the McCune-Reischauer system for romanizing Korean. This is the system that the Library of Congress is using. Nyquill brought up an excellent point about using romanization systems that other large institutions are currently using and it works a lot better than creating our own modified system in most cases (unless we are simplifying).

I'm bringing this up because there is also the Revised Romanization of Hangeul system that was introduced on July 7th, 2000 which has been applied to various Korean road signs transportations etc... The major change of course being that the new system eliminates diacritics in favor of digraphs.

A possible rule would look like:

Songs with Korean metadata must be romanised using the McCune-Reischauer system for romanizing Korean when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally, we could introduce the use of digraphs and two-vowel letters into the proposal:

Vowels /ʌ/ and ㅡ/ɯ/ should be written as digraphs in Korean romanization, and romanized to eo and eu respectively.

Another language to examine is Thai. The Library of Congress recommends nine additional rules for Thai romanization which are:

Library of Congress wrote:

Romanization
1. Tonal marks are not romanized.
2. The symbol ฯ indicates omission and is shown in romanization by “ … ” the conventional sign for
ellipsis.
3. When the repeat symbol ๆ is used, the syllable is repeated in romanization.
4. The symbol ฯลฯ is romanized Ia.
5. Thai consonants are sometimes purely consonantal and sometimes followed by an inherent vowel
romanized o, a, or ǭ depending on the pronunciation as determined from an authoritative
dictionary, such as the Royal Institute's latest edition (1999).
6. Silent consonants, with their accompanying vowels, if any, are not romanized.
7. When the pronunciation requires one consonant to serve a double function – at the end of
one syllable and the beginning of the next – it is romanized twice according to the
respective values.
8. The numerals are: ๐ (0), ๑ (1), ๒ (2), ๓ (3), ๔ (4), ๕ (5), ๖ (6), ๗ (7), ๘ (8), and ๙ (9).
9. In Thai, words are not written separately. In romanization, however, text is divided into words
according to the guidelines provided in Word Division below.
The two rules I am proposing are:

Songs with Thai metadata must be romanised using the Library of Congress system (also known as ISO 11940) for romanizing Thai when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

and

In the romanization of Thai, words should be romanized separately, and separated by a space. Additionally, all words should (or should not?) be uppercased.

Attached are helpful transcription keys for Thai:






Another language that is becoming more and more relevant is Arabic, and there are some issues I would like to bring forth with regards to its romanization.

Here is the table for romanization of Arabic:


As you can see, some issues come up. In the romanization of ص ص ص ص for example, (whether initial, Medial, Final, or Alone) the romanization becomes " ṣ" however, the diacritical mark is not something that can be used by osu because it is still not unicode. I would like to propose that all of these diacritical "," attached to letters be removed for the sake of simplicity and because osu currently does not support them. Therefore something like " ص◌نضوِ◌خ" should be romanized as "sandwich".

Another problem with Arabic is that it is typed in reverse, right to left. Should we also apply this to romanization? In this case "ص◌نضوِ◌خ" would actually be romanized as "hciwdnas" when read left to right as English readers are expected to do.

The rule I am proposing is:

Songs with Arabic metadata must be romanised using the Library of Congress system for romanizing Arabic when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally:

In the romanization of Arabic, words should be romanized in verse order, and the last letter should be be uppercased. For example in romanizing "◌ س◌ !" the correct romanization should be "!usO"

However, there is also the problem of Judeo-Arabic romanization which differs slightly from traditional Arabic romanization. Judeo-Arabic of course, stems from the Jewish Arabs many who live in Iraq and have adopted a slightly different script with respect to certain nouns and verbs. The most common Jewish Arabs are those from Baghdad. Anyways, I digress.

Attached are examples of Judei-Arabic romanziation:


So I would like to propose the following:

Songs with Judeo-Arabic metadata must be romanised using the Library of Congress system for romanizing Judeo-Arabic where Judeo-Arabic nouns and verbs are being used, and where there is no romanisation or translation information listed by a reputable source. Where Judeo-Arabic words and phrases are not used, traditional Arabic romanization will apply. The same applies to the Source field if a romanised Source is preferred by the mapper.
i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?

-----------

Applied Sieg changes to russian as they were.

draft updated, refer to open talking points

Wafu

Okoratu wrote:

i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?
1. I agree this may not be very necessary. I'd say we should only require usage of certain Romanisation method only if the language needs to be Romanised repetitively, not once per history of the game.

2. In my opinion (and this is probably only about opinion), drop it and recommend it to tags. If we use a universal label, I feel like it goes completely against what the artist has chosen. If it is removed, it's not really misrepresenting the artist's choice, it is just us not considering it as a part of the title. Not sure if this makes much sense to anyone else, that's just what I think. Edit: Another important point is that, because people like 1-2 min. songs, they will be encouraged to use the in-game song length sorting/grouping (unfortunately can't be done on website). Most people probably search for short songs just by using "tv size", which can omit a large portion of maps they could actually enjoy. (There's nothing magical about TV Size songs, the only thing is that they are ~1 min 30 sec, it's just the length that players care about.) As people would get used to it, it could actually be more beneficial. (afaik Ephemeral (sorry if it wasn't you, but I think it was) said he would rename all the songs that are any kind of TV Size, so that it's consistent, so people actually would get used to it)

Chinese: Couldn't really find a reference that would not be misleading. They are usually too much concerned about the pronunciation part and don't really tell you how to Romanise it. Plus, many of them use the originally proposed style (not syllable by syllable, but "word" by "word"). Many are paid or blocked by government too for some reason. Someone may need to search a bit more to represent the currently intended system better.

Cyrillic: Wouldn't really care that much for Cyrillic as it's improbable that there will be many songs in it. But it could be recommended to use the BGN/PCGN systems for other Cyrillic. The reason being pretty simple, it's nearly the same as the Russian one. Yes, it is intended majorly for geographic names, but it was explained in past that it was the most common way we see Cyrillic Romanised and is the most compatible with modified Hepburn, which is the most spread Romanisation in osu!. It also avoided the confusion between "ch" and "kh", between "j" and "y", between "c" and "ts" etc. Could elaborate this a bit more if needed.

Korean: I believe the currently used system is "Revised Romanisation of Korean", at least it looks pretty much like it in the maps here. It's a South Korean standard (probably the one we should follow as the probability of North Korean songs being mapped here is very low). Didn't find an official document about it, but this seems like a nice reference. It explains all the things you need for Romanisation. Except for Hanja, but again, the probability of something with Hanja being mapped is low. May need input from Koreans, I haven't seen any problems with the current system.

That's my opinion at this moment.
CrystilonZ
Delete word-by-word lol

Sites on the Internet unfortunately either provide too much (ea. joining of syllables) or isn't at all what we agree on.
If we are going with passport stuff imo just port information in the box below to the wiki. It's pretty similar to the old thread with clearer and better wording and all irrelevant points dropped.

Additional Information on the Romanisation of Chinese.

If there is no information on either Romanisation or translation listed by a reputable source, use the following method to Romanise Chinese metadata.
  1. All Chinese characters must be Romanised using Hanyu Pinyin system. All tone-marking diacritics must be omitted.
    1. 我 : Wo
    2. 三 : San
  2. For any Chinese character that, under the Hanyu Pinyin system, would be Romanised to Lü, Nü, Lüe, or Nüe; Romanise it to Lyu, Nyu, Lue, or Nue repectively instead.
    1. 女 : Nyu
    2. 略 : Lue
  3. Separate each Romanised character with a space and capitalise it. Function words are capitalised as well.
    1. 泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
    2. 兰梓 - 一百块都不给我 : Lan Zi - Yi Bai Kuai Dou Bu Gei Wo
  4. For loan words from other languages, however, Romanise all characters that make up the word into a single word in its original language.
    1. 張韶涵 - 歐若拉 : Angela Zhang - Aurora

In the RC itself
Glossary
  1. Word-by-word Romanisation: Each character must be romanised into a single, capitalised, separated word. Refer to this thread for examples and supplementary information. <<< delete this
  2. Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each Romanised character must be capitalised and separated with a space. Refer to <this wiki page> for more information.
Rules
  1. Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted with Lü, Nü, Lüe, and Nüe substituted with Lyu, Nyu, Lue, and Nue respectively.
^ use "substituted" here. ü is already Romanised but we're just replacing it with stuff because of umlauts.
Natsu
Force the same Tv Size label, since it's easier to identify the lengh of the song and if it's the opening of an anime, also everyone in osu! is used to it, removing it will just cause some confusion IMO.
Mentai
Every Cyrillic translation system is a mess, and have large inconsistencies in vowel treatment as well. I recommend using either right single quotation system, either ISO 9985 or BGN/PCGN. ISO is the most widely used for us Armenians, but having the aspirates translated the same way makes BGN/PCGN a more universal solution for all Cyrillic languages.

basically, BGN/PCGN is probably the best for a universal system for Cyrillic
Scarlet Evans
How should one romanize the title, if romanization, or even unambiguous phonetic transcription, doesn't exist, while several completely different ones are suggested to equally be such, while at the same time it would be completely, totally and absolutely wrong to choose just one of them as a title? I.e. it's not possible to transcribe in phonetic way (or romanize) the title, unless you include all of the several possible meanings? And even should one include all of them, should it be done by separating them by slash symbol, which in case of 4-5 of them would result in terribly long monstrosity king of a gore titles?

-----

Here's more of explanation:


Artists can name their songs however they want, right? They are not obliged to make it "possible" to pronounce or to have unambiguous title that someone can phonetise.

I don't know how it looks like in Chinese language, but a great deal of Japanese kanji characters can be pronounced (and romanized) not in one or two, but even several completely different ways!

I don't know, if someone made song like this, but I find nothing that could stop one from doing this, i.e. what if someone makes the song (which later someone decides to map in osu!), where:

[*] Title have no official romanization and remains "unspoken", i.e. aside of the title written in kanji, even the author never spells if and refers to the song only indirectly in words, unless by writing it or showing the kanji characters.

[*] It have several possible meanings, at least 3-4 of them, and of which ALL are suggested by the lyrics and ALL of them have COMPLETELY DIFFERENT romanization in regards to every single character used in the title. So, you can't phonetise it, as it have several, completely different phonetisations!!

[*] Lyrics of the song are used to maintain and express the ambiguity, creating a contradiction around what the title applies or refers to, making all of these "possible titles" to be "equal" on terms of being the possible title, while at the same time even contradicting each other, either by their contradictory meanings and/or by the lyrics, i.e. all of them are suggested to be the title, while at the same time being refuted from being so.

[*] It all have sense, while reading the kanji characters and extracting all these meaning, definitely making the title to fit into being the title of the song perfectly (!!!), with just one but: you can't possibly spell it or phonetise it, as it have several completely different pronounciations, as all of them can be equally considering as being and not-being (because of contradictory situation and being refuted by other meanings and/or lyrics) the title.

-----

So, each of these "possible titles" can be and can't (!) be the title at the same time, but all of them written down with the very same kanji characters are definitely the title, which deep meanings that are expressed in the lyrics.

-----

How should metadata for such song look like? :o

I don't know, if such song with "unspoken" title, where the title can have multiple meanings, currently exists, and if so, then to what extreme this ambiguity is brought, but there are titles with more than one meaning and sometimes you can't really find anywhere how the title should be pronounced or phonetised...

Also, looking at this discussion, even if such song doesn't exist, then depending on the answer I could decide to spend my time to improve my poor Japanese much better and eventually create one in future, maybe with some voluntary help of someone, just for the sake of trying to rank it in future :P

-----

But seriously... it's not required from an artist to have a title that is possible to be phonetised or romanized... what should a mapper do in such situation? Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?

And please, don't tell me that it's "impossible case" that's not worth considering, as I believe that the people in this community could definitely help to make it possible, if it will be required to.
Wafu

Scarlet Evans wrote:

Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?
Every song needs to be Romanised so that it can be searched normally. If it's your song, give it a name, if it's not, that probably can't be universally covered by RC. These situations (as they virtually don't happen) would probably be treated case by case.
Fycho
Adding additional Information seems redundant lol, if mappers/BN are unsure, they can ask Metadata QATs/Helpers for help like Modified Hepburn for Japanese.

CrystilonZ wrote:

泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
it has a mistake, should be:

没有名字的怪物 => Mei You Ming Zi De Guai Wu
神的随波逐流 => Shen De Sui Bo Zhu Liu
should delete "Word-by-word Romanisation" in Glossary.
Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted.

Considering there are only 5 ranked maps(really a few) that using "ü" until now, and neither "v" nor "yu" isn't the best choice unless we allow ü in romanised field. There will be a metadata discretion by the Metadata Team on specific cases.
CrystilonZ
Agree tbh half way through writing that I realised this is redundant af XDD
Wikipedia kinda provides too much but eeh whatever I guess
and somehow I accidentally copied the wrong title lmfao good job me
Wafu
@Fycho: We can't use the link provided due to this chapter.

Single meaning: Words with a single meaning, which are usually set up of two characters (sometimes one, seldom three), are written together and not capitalized: rén (人, person); péngyou (朋友, friend); qiǎokèlì (巧克力, chocolate)
etc.
This is what our draft suggested at first and what you argued against. I think this will just make it more confusing if you are supposed to ignore some sections of it.
Fycho
Adjusted the draft and simply removed it. I don't think there needs to be any redundant information like modified hepburn for japanese, and most Chinese metadata songs are made by Chinese speakers in the game, they know how pinyin works. Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painfully. Current rule has adequate explanation already.
Wafu

Fycho wrote:

Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painly.
Yeah, I agree with that. Following the wiki would even make you Romanise it differently than what is intended by the rule.

As there was, finally, an agreement on ü, is it still in discussion or why is it missing in the draft?

For the Japanese Romanisation, I think that giving link to Hepburn wiki overall is not the best idea as multiple Hepburn systems are mixed here. What about just linking the Modified Hepburn document? I think this is the best reference as it only says the rules of the Romanisation without redundancy + it's from Library of Congress, which is probably the most official reference we can have.

Suggestion to make the Romanisation rules a bit shorter and cleaner

First of all, for Russian, change "must be romanised using..." to "must use" as it is with other languages.

All the Romanisation systems, consistently, say "when there is no romanisation or translation information listed by a reputable source" and "The same applies to the Source field if a romanised Source is preferred by the mapper."

Assuming there will still be Korean (it should be because we include it as a language selection on the website), it would be messy to have this in every rule. What about making these rules and removing them from the rules of specific languages:
  1. 1. official artist's Romanisations for all languages have priority
  2. 2. if you choose (and as of now, you can choose) to use Romanised source, use the same romanisation method
There's no reason to specify the same thing for every language if it works uniformly.
Topic Starter
Okoratu
Someone should give me the agreement on ü then

- will change the link
- will change russian wording,
- will avoid redundancy by just taking the parts that are redundant out and making them general rules for language specific romanisation works, right?

sth like
Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.
Wafu

Okoratu wrote:

Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.
Seems about right, so now the confirmation about ü, and Korean Romanisation.

Maybe a good reference for Korean (should be the Revised Romanization of Korean, which seems to be used in most maps) might be this.
CrystilonZ
ü is the matter of preference now lol everyone probably has enough information to choose a side but I don't expect a 100 percent consensus
My suggestion about that is
  1. Official translation and/or Romanisation must be used if able. This applies to all fields that can hold romanised data by intent. If there are multiple official translations and/or Romanisations, the mapper is free to choose any of them with the only exception being when there is a previously ranked mapset of that song. In such case the corresponding guideline applies to it. << feel like all of these are related and should be under one single rule. easier to read imo.
  2. If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria. << redundant. The only point this can conflict with is the naming convention stuff which is a specific case. adding stuff like "if necessary, ignore the preferred naming conventions of the artists" under that point is better.
  3. If a song or artist are referred to in multiple ways on official sources provided by the artist, the mapper is free to choose any of the romanisations. The only exception to this is if the song already has a mapset in the Ranked Section, in which case the corresponding guideline applies to it. << redundant also doesn't include translation orz

one more point that I wish to bring up is
If the artist field contains artist names with internally conflicting naming conventions (first name - last name and last name - first name formats), they must be normalized to just use the same format throughout.
^ should specify which one is preferred for consistency.
also naming conventions (stuff) --> the stuff inside parentheses should be in the glossary lol
lastly please add this to the glossary
Diacritical tone marks: ˉ, ˊ, ˇ, and ˋ above vowels in the pinyin system.
Topic Starter
Okoratu
the answer to your specification question is neither
i'll apply the rest tomorrow if i can find what the fukc this means
Lanturn
Hi. I'm late...

About TV Size:
If we do decide to keep the TV Size label, we should probably set standards for other common markers related to song length such as Short Ver. The whole reason TV Size metadata ended up like it is was due to a short ver map actually. By the sounds of the discussion, we've already decided to stop looking for TV Sizes on official sites and be consistent, which is a step in the right direction.

Personally, I'm siding with dropping it into the tags. As mentioned above, we should also consider dropping Short ver (and any others that describe the length of a song) as well since they're essentially the same as a TV Size being special cuts and all. I don't see why TV Size should be the only one with special treatment when the idea as a whole should be related to all song cuts.
Natsu
It would be really annoying without the label, for example when you map the TV Size and the Full version the mapsets merge together, Also if you're looking for a normal length song you are going to get a bunch of tv sizes or viceversa... and being honest people rarely care about tags, so dropping in to tags isn't going to work.

BTW what about using a single label for everything?
Wafu
Can absolutely agree with Lanturn on this.

@Natsu If you add "tv size" to the tags and then search for a map and append "tv size", you should get the one that is TV Size (in game, website will show you probably both in both scenarios). There was also a suggestion that "TV Size" or basically "Short version" could be as an option to search on the website (or maybe just the length). I think that unification should be the last resort, there are way too many ways how to resolve this issue. Just showing the time on the website would be enough. Especially when there are maps that are "TV Size", but are ~30 sec. That's probably not what most people look for, most people probably imagine the standard 1:30 songs. The mystification is present both if you include unified tag or you just put it into the tags. Except the latter can lead to solving this issue entirely, rather than placing a bandage on it.
Noffy
I'm of the opinion that displaying (TV Size) uniformly for all TV Size anime openings/endings would be much clearer to players searching maps of a song. It may take only seconds to check a single map to see what length it is, but what about when there's 5+ mapsets for a really popular song? Then you would have to check each individually to find what you're looking for instead of it being presented on the listing straightaway. People that have not seen TV Size in anime song titles in the past would not be likely to think to add it to their search terms. They'd probably be more likely to think something like, "theme song" or "opening" or "ending" or "short" if they're an English speaker, for instance. This can be seen by searching anime themes on YouTube and seeing how relatively rarely they use (TV Size) for their video titles.
However, if there were any way to have the length of a song displayed on a map's panel in the search listing, I'd totally agree with adding it to tags instead. But that's a website feature that would need to be considered separately...
And would also require begging peppy / flyte to add it

Also about what Lanturn brought up concerning making uniform tags for different types of length indicators like Short ver. Game ver. etc.: this sounds like a good idea, honestly. Then the main concern for metadata would be the accuracy of the actual main title, and not worrying about having to find out what particular variation of a label some obscure official source happens to use.
Topic Starter
Okoratu
it can work either way

we'd just need to set a standard and stick to it
Lanturn
Alright. So let's say we want to keep TV Size, Short Ver. etc. Let me attempt to write up a list. We should probably be discussing this in advance if we do end up keeping it. Since classifying types is another big job in itself.

TV Size -> (TV Size, Anime Ver, Opening Ver, Ending Ver, TV Edit... This list is endless)
This would be used in all OP/Endings and specific cuts used in the show. This includes non-japanese songs like cartoons, sitcoms, etc. (The Friends opening as an example would use TV Size)
Example: https://osu.ppy.sh/b/1533376 (Anime Version labeled as a TV Size on its official release)

Short Ver. / Extended Ver.-> (Every other Song, Visual Novel Opening/Endings)
A song that has had its song time cut officially from the original and doesn't meet the criteria of the other cuts. This also includes Visual Novels as they are mostly labeled with short versions over game size.

Game Size -> (Game Opening, Endings, Insert Songs, Some BGM tracks)
A specific cut when dealing with video games. (not including visual novels) Similar to TV Size, but when dealing with games. Example: https://www.youtube.com/watch?v=qAQUparDhtg
In addition. I'm wondering what we should do with Rhythm Game cuts.
Element of SPADA for example has a version made for the rhythm games and a full version. In the majority of cases, these songs are released without any sort of markers, but then this contradicts with the whole labeling based on version.


Short Cut / Extended Cut (Unofficial cuts/additions made to any song.)
Pretty self-explanatory. There's basically two options here: Add a marker to show that it's a specific cut, or use the original versions length (Cutting a Full ver to roughly the length of a TV Size would result in using the Full Ver title.)
Example: https://osu.ppy.sh/b/211503 (I cut this from the full version and it is different from the TV Size)

Full Ver. -> (The full official release.)
Drop the marker outright, this includes songs that are officially marked with (Full Ver.) in the title.

Nightcore, Speed Up Ver, and other edits that alter BPM will probably be left in the title field.

I'm sure I've missed quite a bit, but here's a start I guess.

Alternatively, there is the whole convert everything into Short Ver. / Extended Ver. that Natsu suggested.

A few topics for discussion from this if you missed them while reading: (The TL;DR)
Rhythm Video Game cuts. What label to add?
Should unofficial cuts be differentiated from official version releases?
Should unofficial cuts use the original title or the version they closely represent?
Markers I may have missed or any that should be removed.
Topic Starter
Okoratu
@Lanturn: i dont think we can be exhaustive with these lists fwiw

i'd just apply pareto principle to this and go with the 20% of work for 80% of cases and handle the rest via guidelines and establish new rulings as we go along?
Noffy

Noffy wrote:

ok time for a re-review with slightly fresher eyes

the thing part thirty wrote:

Guest mappers, storyboarders, and hitsounders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributors of any given beatmap set.

-> + "Skinners should be added if they made the skin specifically for the mapset" (in contrast to someone just borrowing/mixing skin elements that're already out there) (this would be nice)


the thing part forty two wrote:

Commas, vs., &, any variations of feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.

(CV: blah) vs. ( CV: blah ) . the latter would look silly, so CV: shouldn't require leading whitespace either. Or uhhh... this doesn't apply to sides which have the inside of a bracket next to them? or something. since it'd also apply to like, (feat.) vs. ( feat. ) which isn't.. better really.. hmmm
I'm not sure how to fix the wording for this though
aaaaaaaa~



Repost because it got kind of buried before ahah
Mainly concerned about the confusion whitespace as it's currently written could cause

Idea wrote:

Trailing/leading whitespace is not required if the character next to it is the inner side of a bracket.
Example: Hello (CV: Goodbye) is okay, Hello( CV: Goodbye ) is not.
Lanturn
As for what Noffy wrote. I guess this was one of the things I was going to bring up in the next proposal for some guidelines. Anything that has some sort of opening and closing shouldn't require a space after the opener and before the closer.

*This* -that- <then> [thus] etc.

I also had something to address when it concerns romanizing from languages that don't typically use spaces like Japanese. It's something the metadata team has been pushing as of late when it comes to these languages. However, I'll probably save this for a later date since it's too late to be pushing this guideline through right now. The metadata team still recommends this though.

The guideline (in a nutshell) if anyone was interested
ジョジョ~その血の運命~ Archetype MIX Ver.
JoJo ~Sono Chi no Sadame~ Archetype MIX Ver.

when a symbol is alone and doesn't have a spacing, the romanization should have a whitespace before and after.(Ex. if the title was "ジョジョ~その" we'd use "JoJo ~ Sono" when romanizing)

When a symbol comes in pairs (like mentioned above), use a space before the first symbol and after the last symbol (Not needed if the symbol is the last character). (Ex. if the title was "ジョジョ~その血の運命~" we would use "JoJo ~Sono Chi no Sadame~"


-----------------------------------------------------------------

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
Shiguma

Lanturn wrote:

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
I believe it should be moved to metadata. Conflicts shouldn't really happen if we're using a common sense approach, the few samples you had laid out earlier seem like they would work for most cases. It is just a much better way to handle determining cut songs vs full songs, unless the length is obvious from the listing as Noffy said.
Monstrata
Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
Ulysses

Fycho wrote:

Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.

The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.
I wish not to fuel the fire of this longstanding debate which somehow has the resolution already. Yet, there is a flaw in this rule that I cannot resist to break my silence. It is not about 'v', 'yu', and 'yi', but that this rule does not take the existence of another language that adopts Chinese characters but whose pronunciation (and romanisation) are nothing like Mandarin Chinese into account. This language is Cantonese.

I will keep this post as short as possible.

Background

There are Cantonese songs in ranked maps, Kevin Cheng - Syut Ha Si -Thinking Under the Snow, Elanne Kwong - Sa Jiao, and JJ Lin - Jiang Nan (Cantonese Ver.), for instance.[1] Some of the titles are romanised in a wrong way. Elanne Kwong - Sa Jiao is a paradigm of errors of such kind. Its Cantonese romanisation is saat giu, not sa jiao. Whereas the former is the Cantonese and the correct romanisation of the title, the latter, Mandarin. The song is not a Mandarin song, but a Cantonese one.

In light of this, I propose that the rule be altered, below is the proposed wording of the rule after adjustment:

"Character-by-character Romanisation: [e]ach Chinese character must be Romanised [by] using Hanyu Pinyin system if the song is a Mandarin one; Jyutping if the song is a Cantonese one, and each romanised character must be capitalised and separated with a space."



Differences between Mandarin and Cantonese Romanisation

There are many ways wherein the two languages are different. From pronunciations and semantics of individual Chinese words to romanisation of each word. Since this is not a post about the teaching of the languages but the romanisation of the Cantonese language, the semantics needs not concern us. The other two will be discussed.

Nearly all Chinese characters are pronounced in a different way in Cantonese. Taking the word '我' (meaning 'I' as well as 'me' and sometimes, 'my') as an illustration, it is pronounced 'wo' in Mandarin but 'ngo' in Cantonese.

Not all Cantonese words have Mandarin romanisation. Par example, the word '冇' (meaning 'no' and 'nothing') has no Mandarin pinyin because it is non-existent in Mandarin. Its Jyutping is, however, 'mou'. Therefore, some Cantonese words cannot be romanised by using Hanyu Pinyin.


Arguments for the Proposed Change


1. Misrepresentation. Cantonese and Mandarin Chinese are very different in pronunciation (and meanings). Romanising Cantonese pinyin will result in misrepresentation.

2. Unable to search online. Searching the Cantonese song with the Romanised pinyin online will most likely get you a Mandarin version of the song (with lyrics changed, different pronunciations of each word and very often, different singers), effectively two different songs.

3. Wrongness. It is plainly wrong to romanise Cantonese by using pinyin. It is as if romanising Japanese kanji by using pinyin.


Romanisation of Cantonese


Jyutping does not contain any characters with accents.[2] Therefore, there will not be the slightest difficulty rendering the original Jyutping words into Romanised words.

Jyutping is easily obtainable. Mappers can easily obtain the Jyutping of individual words from the Chinese University of Hong Kong Dictionary[3].





I hope you can take this post into consideration.

[1] For more examples, see https://osu.ppy.sh/p/beatmaplist?q=cantonese
[2] For the list of consonants and vowels of Cantonese, see http://www.cantonese.sheik.co.uk/essays/jyutping.htm
[3] The website of the Chinese University of Hong Kong Dictionary: http://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/ ; to obtain the Jyutping of a word, type the word in question into the search box at the top-left corner of the page. Once searched, you can obtain the Jyutping from the first box on the left labelled 'syllable'.
Wafu
@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically. You could as well Romanise it using Cantonese pinyin, but you don't really say why that's bad here.

Anyway, the proposal was actually addressing this wording issue (we specified that when we talk about Chinese, we mean Mandarin, this draft doesn't say anything like that), but it seems like the error is back again. There shouldn't be "each Chinese character", but "each Mandarin character".
Skylish
The standard of romanization should be defined as: translating a language into Latin words. Under the rationale of transcription, the romanization of letters (in Latin letters) should be followed by the language. Dialects or different languages should be considered as independent language system since they have completely different pronunciation methods and systems.

I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese.

EDIT:

Another issue is the usage of Traditional Chinese and Simplified Chinese. Simply speaking, only Taiwan region, HKSAR and Macau use Traditional Chinese. Hence the metadata from these regions should be Traditional Chinese, no matter what languages are in the songs.
Ulysses

Wafu wrote:

@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically.
(This post is a continuation of my first post on page 12; the 179th post.)

You made some interesting points. However, I still believe that there is a need to Romanise Cantonese title by using Jyutping, not Pinyin. You arguments can be summarised as follow:

Wafu's Arguments


1. Cantonese is a minority language (on osu!). As such, we do not need to Romanise Cantonese.
2. To Romanise Cantonese the community needs to discuss which Jyutping is the most appropriate to use, therefore, in avoidance of this inconvenience, it is better for the community not to discuss.
3. The tones are to be taken into consideration, and it is troublesome. Thus, the community should not discuss it.

Rebuttals


Much as Wafu's arguments are interesting, they are no valid points. It appears to me plainly obvious that arguments (2) and (3) are fallacious. Simply because something is troublesome and inconvenient shall not bar the discussion of the community. In fact, most of the rules here on osu! are products after fierce debates. If we are to avoid any debate that may potentially arise, we better not to discuss any rule, and such a conclusion is, indeed, absurd.

Arguments (2) and (3) are also flawed in a way that they are not factually correct. For (2), Jyutping is the most widely used and the most authoritative. It is supported by a University dictionary and is the most accessible. Yales Cantonese Romanisation is also backed by Yales University, but it does not have an online dictionary and is not widely used at all. For (3), the tones in Mandarin are not taken into account when romanising, I see no reason one will be interested in the tones in the romanisation of Cantonese Jyutping.

Argument (1) is also wrong on several levels. On a factual level, it is unsound. Admittedly, Cantonese songs are no popular song type on osu!, nonetheless, there are more than 20 mapsthat are in Cantonese songs (note that the list I provided in my previous post is not exhaustive because not every Cantonese song has the word 'Cantonese' in tags), whilst there are 24 German songs ranked on osu!. If the premise is that the Romanisation of German titles is necessary, by analogy, I see no convincing reason why Cantonese should not be Romanised.

On a logical level, argument (1) is invalid. The premise 'Cantonese is a minority language' does not entail 'Cantonese does not need Romanisation', and moreover the implicit conclusion that 'Cantonese Romanisation shall be replaced by Mandarin Romanisation Pinyin'. It is one thing not to Romanise Cantonese, but it is quite another thing to Romanise Cantonese in a wrong way by using a different language system to Romanise it. Perhaps I am not clear enough. What I mean is that in the old days, where the rule that presently concerns us was yet in existence, mappers had the liberty to use Jyutping to Romanise Cantonese titles. However, once this rule is passed, all Cantonese titles will be in need to be Romanised by using the Mandarin Pinyin system. In other words, this rule in its state will coerce something that is wrong to happen. And that's why I suggest the change in question.

And Wafu mentioned that one could Romanise Cantonese by using Cantonese Pinying. This is a misconception. Cantonese Pinyin does not exist. The equivalent is Cantonese Jyutping and that is why I am proposing this change.
Topic Starter
Okoratu
idk why everyone here has to sound like they just had a dictionary for breakfast but whatever

to me the points raised by nold make sense, if there's words that cannot be romanised used pinyin then pinyin maybe shouldnt be used - as such the mandarin metadata ruling would need to specify what it applies to, i dunno if this is common enough to deserve its own ruling tho

edit: can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"
Fycho
We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.

If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.
show more
Please sign in to reply.

New reply