forum

[Proposal] Metadata section overhaul

posted
Total Posts
216
show more
Topic Starter
Okoratu
:D cleaning up open points in progress

open talking points


  • Chinese:
  1. i'm not willing to link a 4 year old thread on the ranking criteria. either it should be ported to the wiki or that sentence dropped or its information condensed down into more guidelines on the topic
  2. Hanyu Pinyin system needs an external reference if available, someone please provide me a link
Cyrillic:
  1. Cyricllic is now undefined - anyone fancy coming up with some definition if the current thing only works for russian?
Korean:
  1. Input from Koreans as to which standard is used and should be used going forward is needed
Thai
  1. Input from natives required as to which standard is to be used
Arabic
  1. do we currently need this? I have a hard time telling how much monstrata trolls in this post
TV Size label
  1. FORCE the same way or FORCE DROPPING the label?
will update with issues i have as i go through the thread updating open points

Monstrata wrote:

With respects to Korean romanization, I'm wondering if we should continue applying the McCune-Reischauer system for romanizing Korean. This is the system that the Library of Congress is using. Nyquill brought up an excellent point about using romanization systems that other large institutions are currently using and it works a lot better than creating our own modified system in most cases (unless we are simplifying).

I'm bringing this up because there is also the Revised Romanization of Hangeul system that was introduced on July 7th, 2000 which has been applied to various Korean road signs transportations etc... The major change of course being that the new system eliminates diacritics in favor of digraphs.

A possible rule would look like:

Songs with Korean metadata must be romanised using the McCune-Reischauer system for romanizing Korean when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally, we could introduce the use of digraphs and two-vowel letters into the proposal:

Vowels /ʌ/ and ㅡ/ɯ/ should be written as digraphs in Korean romanization, and romanized to eo and eu respectively.

Another language to examine is Thai. The Library of Congress recommends nine additional rules for Thai romanization which are:

Library of Congress wrote:

Romanization
1. Tonal marks are not romanized.
2. The symbol ฯ indicates omission and is shown in romanization by “ … ” the conventional sign for
ellipsis.
3. When the repeat symbol ๆ is used, the syllable is repeated in romanization.
4. The symbol ฯลฯ is romanized Ia.
5. Thai consonants are sometimes purely consonantal and sometimes followed by an inherent vowel
romanized o, a, or ǭ depending on the pronunciation as determined from an authoritative
dictionary, such as the Royal Institute's latest edition (1999).
6. Silent consonants, with their accompanying vowels, if any, are not romanized.
7. When the pronunciation requires one consonant to serve a double function – at the end of
one syllable and the beginning of the next – it is romanized twice according to the
respective values.
8. The numerals are: ๐ (0), ๑ (1), ๒ (2), ๓ (3), ๔ (4), ๕ (5), ๖ (6), ๗ (7), ๘ (8), and ๙ (9).
9. In Thai, words are not written separately. In romanization, however, text is divided into words
according to the guidelines provided in Word Division below.
The two rules I am proposing are:

Songs with Thai metadata must be romanised using the Library of Congress system (also known as ISO 11940) for romanizing Thai when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

and

In the romanization of Thai, words should be romanized separately, and separated by a space. Additionally, all words should (or should not?) be uppercased.

Attached are helpful transcription keys for Thai:






Another language that is becoming more and more relevant is Arabic, and there are some issues I would like to bring forth with regards to its romanization.

Here is the table for romanization of Arabic:


As you can see, some issues come up. In the romanization of ص ص ص ص for example, (whether initial, Medial, Final, or Alone) the romanization becomes " ṣ" however, the diacritical mark is not something that can be used by osu because it is still not unicode. I would like to propose that all of these diacritical "," attached to letters be removed for the sake of simplicity and because osu currently does not support them. Therefore something like " ص◌نضوِ◌خ" should be romanized as "sandwich".

Another problem with Arabic is that it is typed in reverse, right to left. Should we also apply this to romanization? In this case "ص◌نضوِ◌خ" would actually be romanized as "hciwdnas" when read left to right as English readers are expected to do.

The rule I am proposing is:

Songs with Arabic metadata must be romanised using the Library of Congress system for romanizing Arabic when there is no romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper.

Additionally:

In the romanization of Arabic, words should be romanized in verse order, and the last letter should be be uppercased. For example in romanizing "◌ س◌ !" the correct romanization should be "!usO"

However, there is also the problem of Judeo-Arabic romanization which differs slightly from traditional Arabic romanization. Judeo-Arabic of course, stems from the Jewish Arabs many who live in Iraq and have adopted a slightly different script with respect to certain nouns and verbs. The most common Jewish Arabs are those from Baghdad. Anyways, I digress.

Attached are examples of Judei-Arabic romanziation:


So I would like to propose the following:

Songs with Judeo-Arabic metadata must be romanised using the Library of Congress system for romanizing Judeo-Arabic where Judeo-Arabic nouns and verbs are being used, and where there is no romanisation or translation information listed by a reputable source. Where Judeo-Arabic words and phrases are not used, traditional Arabic romanization will apply. The same applies to the Source field if a romanised Source is preferred by the mapper.
i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?

-----------

Applied Sieg changes to russian as they were.

draft updated, refer to open talking points

Wafu

Okoratu wrote:

i'm very unsure about these things, can someone with knowledge of the language please clear this up?
find me a song with aegyptic hieroglyphs first that is sung in the language and mapped on osu, i dont think a language with 300k tribal speakers worldwide is relevant to the ranking criteria

-----------

TV Size labels in title: drop universally or force the same labeling on all tv size songs?
1. I agree this may not be very necessary. I'd say we should only require usage of certain Romanisation method only if the language needs to be Romanised repetitively, not once per history of the game.

2. In my opinion (and this is probably only about opinion), drop it and recommend it to tags. If we use a universal label, I feel like it goes completely against what the artist has chosen. If it is removed, it's not really misrepresenting the artist's choice, it is just us not considering it as a part of the title. Not sure if this makes much sense to anyone else, that's just what I think. Edit: Another important point is that, because people like 1-2 min. songs, they will be encouraged to use the in-game song length sorting/grouping (unfortunately can't be done on website). Most people probably search for short songs just by using "tv size", which can omit a large portion of maps they could actually enjoy. (There's nothing magical about TV Size songs, the only thing is that they are ~1 min 30 sec, it's just the length that players care about.) As people would get used to it, it could actually be more beneficial. (afaik Ephemeral (sorry if it wasn't you, but I think it was) said he would rename all the songs that are any kind of TV Size, so that it's consistent, so people actually would get used to it)

Chinese: Couldn't really find a reference that would not be misleading. They are usually too much concerned about the pronunciation part and don't really tell you how to Romanise it. Plus, many of them use the originally proposed style (not syllable by syllable, but "word" by "word"). Many are paid or blocked by government too for some reason. Someone may need to search a bit more to represent the currently intended system better.

Cyrillic: Wouldn't really care that much for Cyrillic as it's improbable that there will be many songs in it. But it could be recommended to use the BGN/PCGN systems for other Cyrillic. The reason being pretty simple, it's nearly the same as the Russian one. Yes, it is intended majorly for geographic names, but it was explained in past that it was the most common way we see Cyrillic Romanised and is the most compatible with modified Hepburn, which is the most spread Romanisation in osu!. It also avoided the confusion between "ch" and "kh", between "j" and "y", between "c" and "ts" etc. Could elaborate this a bit more if needed.

Korean: I believe the currently used system is "Revised Romanisation of Korean", at least it looks pretty much like it in the maps here. It's a South Korean standard (probably the one we should follow as the probability of North Korean songs being mapped here is very low). Didn't find an official document about it, but this seems like a nice reference. It explains all the things you need for Romanisation. Except for Hanja, but again, the probability of something with Hanja being mapped is low. May need input from Koreans, I haven't seen any problems with the current system.

That's my opinion at this moment.
CrystilonZ
Delete word-by-word lol

Sites on the Internet unfortunately either provide too much (ea. joining of syllables) or isn't at all what we agree on.
If we are going with passport stuff imo just port information in the box below to the wiki. It's pretty similar to the old thread with clearer and better wording and all irrelevant points dropped.

Additional Information on the Romanisation of Chinese.

If there is no information on either Romanisation or translation listed by a reputable source, use the following method to Romanise Chinese metadata.
  1. All Chinese characters must be Romanised using Hanyu Pinyin system. All tone-marking diacritics must be omitted.
    1. 我 : Wo
    2. 三 : San
  2. For any Chinese character that, under the Hanyu Pinyin system, would be Romanised to Lü, Nü, Lüe, or Nüe; Romanise it to Lyu, Nyu, Lue, or Nue repectively instead.
    1. 女 : Nyu
    2. 略 : Lue
  3. Separate each Romanised character with a space and capitalise it. Function words are capitalised as well.
    1. 泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
    2. 兰梓 - 一百块都不给我 : Lan Zi - Yi Bai Kuai Dou Bu Gei Wo
  4. For loan words from other languages, however, Romanise all characters that make up the word into a single word in its original language.
    1. 張韶涵 - 歐若拉 : Angela Zhang - Aurora

In the RC itself
Glossary
  1. Word-by-word Romanisation: Each character must be romanised into a single, capitalised, separated word. Refer to this thread for examples and supplementary information. <<< delete this
  2. Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each Romanised character must be capitalised and separated with a space. Refer to <this wiki page> for more information.
Rules
  1. Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted with Lü, Nü, Lüe, and Nüe substituted with Lyu, Nyu, Lue, and Nue respectively.
^ use "substituted" here. ü is already Romanised but we're just replacing it with stuff because of umlauts.
Natsu
Force the same Tv Size label, since it's easier to identify the lengh of the song and if it's the opening of an anime, also everyone in osu! is used to it, removing it will just cause some confusion IMO.
Mentai
Every Cyrillic translation system is a mess, and have large inconsistencies in vowel treatment as well. I recommend using either right single quotation system, either ISO 9985 or BGN/PCGN. ISO is the most widely used for us Armenians, but having the aspirates translated the same way makes BGN/PCGN a more universal solution for all Cyrillic languages.

basically, BGN/PCGN is probably the best for a universal system for Cyrillic
Scarlet Evans
How should one romanize the title, if romanization, or even unambiguous phonetic transcription, doesn't exist, while several completely different ones are suggested to equally be such, while at the same time it would be completely, totally and absolutely wrong to choose just one of them as a title? I.e. it's not possible to transcribe in phonetic way (or romanize) the title, unless you include all of the several possible meanings? And even should one include all of them, should it be done by separating them by slash symbol, which in case of 4-5 of them would result in terribly long monstrosity king of a gore titles?

-----

Here's more of explanation:


Artists can name their songs however they want, right? They are not obliged to make it "possible" to pronounce or to have unambiguous title that someone can phonetise.

I don't know how it looks like in Chinese language, but a great deal of Japanese kanji characters can be pronounced (and romanized) not in one or two, but even several completely different ways!

I don't know, if someone made song like this, but I find nothing that could stop one from doing this, i.e. what if someone makes the song (which later someone decides to map in osu!), where:

[*] Title have no official romanization and remains "unspoken", i.e. aside of the title written in kanji, even the author never spells if and refers to the song only indirectly in words, unless by writing it or showing the kanji characters.

[*] It have several possible meanings, at least 3-4 of them, and of which ALL are suggested by the lyrics and ALL of them have COMPLETELY DIFFERENT romanization in regards to every single character used in the title. So, you can't phonetise it, as it have several, completely different phonetisations!!

[*] Lyrics of the song are used to maintain and express the ambiguity, creating a contradiction around what the title applies or refers to, making all of these "possible titles" to be "equal" on terms of being the possible title, while at the same time even contradicting each other, either by their contradictory meanings and/or by the lyrics, i.e. all of them are suggested to be the title, while at the same time being refuted from being so.

[*] It all have sense, while reading the kanji characters and extracting all these meaning, definitely making the title to fit into being the title of the song perfectly (!!!), with just one but: you can't possibly spell it or phonetise it, as it have several completely different pronounciations, as all of them can be equally considering as being and not-being (because of contradictory situation and being refuted by other meanings and/or lyrics) the title.

-----

So, each of these "possible titles" can be and can't (!) be the title at the same time, but all of them written down with the very same kanji characters are definitely the title, which deep meanings that are expressed in the lyrics.

-----

How should metadata for such song look like? :o

I don't know, if such song with "unspoken" title, where the title can have multiple meanings, currently exists, and if so, then to what extreme this ambiguity is brought, but there are titles with more than one meaning and sometimes you can't really find anywhere how the title should be pronounced or phonetised...

Also, looking at this discussion, even if such song doesn't exist, then depending on the answer I could decide to spend my time to improve my poor Japanese much better and eventually create one in future, maybe with some voluntary help of someone, just for the sake of trying to rank it in future :P

-----

But seriously... it's not required from an artist to have a title that is possible to be phonetised or romanized... what should a mapper do in such situation? Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?

And please, don't tell me that it's "impossible case" that's not worth considering, as I believe that the people in this community could definitely help to make it possible, if it will be required to.
Wafu

Scarlet Evans wrote:

Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?
Every song needs to be Romanised so that it can be searched normally. If it's your song, give it a name, if it's not, that probably can't be universally covered by RC. These situations (as they virtually don't happen) would probably be treated case by case.
Fycho
Adding additional Information seems redundant lol, if mappers/BN are unsure, they can ask Metadata QATs/Helpers for help like Modified Hepburn for Japanese.

CrystilonZ wrote:

泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
it has a mistake, should be:

没有名字的怪物 => Mei You Ming Zi De Guai Wu
神的随波逐流 => Shen De Sui Bo Zhu Liu
should delete "Word-by-word Romanisation" in Glossary.
Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted.

Considering there are only 5 ranked maps(really a few) that using "ü" until now, and neither "v" nor "yu" isn't the best choice unless we allow ü in romanised field. There will be a metadata discretion by the Metadata Team on specific cases.
CrystilonZ
Agree tbh half way through writing that I realised this is redundant af XDD
Wikipedia kinda provides too much but eeh whatever I guess
and somehow I accidentally copied the wrong title lmfao good job me
Wafu
@Fycho: We can't use the link provided due to this chapter.

Single meaning: Words with a single meaning, which are usually set up of two characters (sometimes one, seldom three), are written together and not capitalized: rén (人, person); péngyou (朋友, friend); qiǎokèlì (巧克力, chocolate)
etc.
This is what our draft suggested at first and what you argued against. I think this will just make it more confusing if you are supposed to ignore some sections of it.
Fycho
Adjusted the draft and simply removed it. I don't think there needs to be any redundant information like modified hepburn for japanese, and most Chinese metadata songs are made by Chinese speakers in the game, they know how pinyin works. Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painfully. Current rule has adequate explanation already.
Wafu

Fycho wrote:

Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painly.
Yeah, I agree with that. Following the wiki would even make you Romanise it differently than what is intended by the rule.

As there was, finally, an agreement on ü, is it still in discussion or why is it missing in the draft?

For the Japanese Romanisation, I think that giving link to Hepburn wiki overall is not the best idea as multiple Hepburn systems are mixed here. What about just linking the Modified Hepburn document? I think this is the best reference as it only says the rules of the Romanisation without redundancy + it's from Library of Congress, which is probably the most official reference we can have.

Suggestion to make the Romanisation rules a bit shorter and cleaner

First of all, for Russian, change "must be romanised using..." to "must use" as it is with other languages.

All the Romanisation systems, consistently, say "when there is no romanisation or translation information listed by a reputable source" and "The same applies to the Source field if a romanised Source is preferred by the mapper."

Assuming there will still be Korean (it should be because we include it as a language selection on the website), it would be messy to have this in every rule. What about making these rules and removing them from the rules of specific languages:
  1. 1. official artist's Romanisations for all languages have priority
  2. 2. if you choose (and as of now, you can choose) to use Romanised source, use the same romanisation method
There's no reason to specify the same thing for every language if it works uniformly.
Topic Starter
Okoratu
Someone should give me the agreement on ü then

- will change the link
- will change russian wording,
- will avoid redundancy by just taking the parts that are redundant out and making them general rules for language specific romanisation works, right?

sth like
Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.
Wafu

Okoratu wrote:

Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.
Seems about right, so now the confirmation about ü, and Korean Romanisation.

Maybe a good reference for Korean (should be the Revised Romanization of Korean, which seems to be used in most maps) might be this.
CrystilonZ
ü is the matter of preference now lol everyone probably has enough information to choose a side but I don't expect a 100 percent consensus
My suggestion about that is
  1. Official translation and/or Romanisation must be used if able. This applies to all fields that can hold romanised data by intent. If there are multiple official translations and/or Romanisations, the mapper is free to choose any of them with the only exception being when there is a previously ranked mapset of that song. In such case the corresponding guideline applies to it. << feel like all of these are related and should be under one single rule. easier to read imo.
  2. If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria. << redundant. The only point this can conflict with is the naming convention stuff which is a specific case. adding stuff like "if necessary, ignore the preferred naming conventions of the artists" under that point is better.
  3. If a song or artist are referred to in multiple ways on official sources provided by the artist, the mapper is free to choose any of the romanisations. The only exception to this is if the song already has a mapset in the Ranked Section, in which case the corresponding guideline applies to it. << redundant also doesn't include translation orz

one more point that I wish to bring up is
If the artist field contains artist names with internally conflicting naming conventions (first name - last name and last name - first name formats), they must be normalized to just use the same format throughout.
^ should specify which one is preferred for consistency.
also naming conventions (stuff) --> the stuff inside parentheses should be in the glossary lol
lastly please add this to the glossary
Diacritical tone marks: ˉ, ˊ, ˇ, and ˋ above vowels in the pinyin system.
Topic Starter
Okoratu
the answer to your specification question is neither
i'll apply the rest tomorrow if i can find what the fukc this means
Lanturn
Hi. I'm late...

About TV Size:
If we do decide to keep the TV Size label, we should probably set standards for other common markers related to song length such as Short Ver. The whole reason TV Size metadata ended up like it is was due to a short ver map actually. By the sounds of the discussion, we've already decided to stop looking for TV Sizes on official sites and be consistent, which is a step in the right direction.

Personally, I'm siding with dropping it into the tags. As mentioned above, we should also consider dropping Short ver (and any others that describe the length of a song) as well since they're essentially the same as a TV Size being special cuts and all. I don't see why TV Size should be the only one with special treatment when the idea as a whole should be related to all song cuts.
Natsu
It would be really annoying without the label, for example when you map the TV Size and the Full version the mapsets merge together, Also if you're looking for a normal length song you are going to get a bunch of tv sizes or viceversa... and being honest people rarely care about tags, so dropping in to tags isn't going to work.

BTW what about using a single label for everything?
Wafu
Can absolutely agree with Lanturn on this.

@Natsu If you add "tv size" to the tags and then search for a map and append "tv size", you should get the one that is TV Size (in game, website will show you probably both in both scenarios). There was also a suggestion that "TV Size" or basically "Short version" could be as an option to search on the website (or maybe just the length). I think that unification should be the last resort, there are way too many ways how to resolve this issue. Just showing the time on the website would be enough. Especially when there are maps that are "TV Size", but are ~30 sec. That's probably not what most people look for, most people probably imagine the standard 1:30 songs. The mystification is present both if you include unified tag or you just put it into the tags. Except the latter can lead to solving this issue entirely, rather than placing a bandage on it.
Noffy
I'm of the opinion that displaying (TV Size) uniformly for all TV Size anime openings/endings would be much clearer to players searching maps of a song. It may take only seconds to check a single map to see what length it is, but what about when there's 5+ mapsets for a really popular song? Then you would have to check each individually to find what you're looking for instead of it being presented on the listing straightaway. People that have not seen TV Size in anime song titles in the past would not be likely to think to add it to their search terms. They'd probably be more likely to think something like, "theme song" or "opening" or "ending" or "short" if they're an English speaker, for instance. This can be seen by searching anime themes on YouTube and seeing how relatively rarely they use (TV Size) for their video titles.
However, if there were any way to have the length of a song displayed on a map's panel in the search listing, I'd totally agree with adding it to tags instead. But that's a website feature that would need to be considered separately...
And would also require begging peppy / flyte to add it

Also about what Lanturn brought up concerning making uniform tags for different types of length indicators like Short ver. Game ver. etc.: this sounds like a good idea, honestly. Then the main concern for metadata would be the accuracy of the actual main title, and not worrying about having to find out what particular variation of a label some obscure official source happens to use.
Topic Starter
Okoratu
it can work either way

we'd just need to set a standard and stick to it
Lanturn
Alright. So let's say we want to keep TV Size, Short Ver. etc. Let me attempt to write up a list. We should probably be discussing this in advance if we do end up keeping it. Since classifying types is another big job in itself.

TV Size -> (TV Size, Anime Ver, Opening Ver, Ending Ver, TV Edit... This list is endless)
This would be used in all OP/Endings and specific cuts used in the show. This includes non-japanese songs like cartoons, sitcoms, etc. (The Friends opening as an example would use TV Size)
Example: https://osu.ppy.sh/b/1533376 (Anime Version labeled as a TV Size on its official release)

Short Ver. / Extended Ver.-> (Every other Song, Visual Novel Opening/Endings)
A song that has had its song time cut officially from the original and doesn't meet the criteria of the other cuts. This also includes Visual Novels as they are mostly labeled with short versions over game size.

Game Size -> (Game Opening, Endings, Insert Songs, Some BGM tracks)
A specific cut when dealing with video games. (not including visual novels) Similar to TV Size, but when dealing with games. Example: https://www.youtube.com/watch?v=qAQUparDhtg
In addition. I'm wondering what we should do with Rhythm Game cuts.
Element of SPADA for example has a version made for the rhythm games and a full version. In the majority of cases, these songs are released without any sort of markers, but then this contradicts with the whole labeling based on version.


Short Cut / Extended Cut (Unofficial cuts/additions made to any song.)
Pretty self-explanatory. There's basically two options here: Add a marker to show that it's a specific cut, or use the original versions length (Cutting a Full ver to roughly the length of a TV Size would result in using the Full Ver title.)
Example: https://osu.ppy.sh/b/211503 (I cut this from the full version and it is different from the TV Size)

Full Ver. -> (The full official release.)
Drop the marker outright, this includes songs that are officially marked with (Full Ver.) in the title.

Nightcore, Speed Up Ver, and other edits that alter BPM will probably be left in the title field.

I'm sure I've missed quite a bit, but here's a start I guess.

Alternatively, there is the whole convert everything into Short Ver. / Extended Ver. that Natsu suggested.

A few topics for discussion from this if you missed them while reading: (The TL;DR)
Rhythm Video Game cuts. What label to add?
Should unofficial cuts be differentiated from official version releases?
Should unofficial cuts use the original title or the version they closely represent?
Markers I may have missed or any that should be removed.
Topic Starter
Okoratu
@Lanturn: i dont think we can be exhaustive with these lists fwiw

i'd just apply pareto principle to this and go with the 20% of work for 80% of cases and handle the rest via guidelines and establish new rulings as we go along?
Noffy

Noffy wrote:

ok time for a re-review with slightly fresher eyes

the thing part thirty wrote:

Guest mappers, storyboarders, and hitsounders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributors of any given beatmap set.

-> + "Skinners should be added if they made the skin specifically for the mapset" (in contrast to someone just borrowing/mixing skin elements that're already out there) (this would be nice)


the thing part forty two wrote:

Commas, vs., &, any variations of feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.

(CV: blah) vs. ( CV: blah ) . the latter would look silly, so CV: shouldn't require leading whitespace either. Or uhhh... this doesn't apply to sides which have the inside of a bracket next to them? or something. since it'd also apply to like, (feat.) vs. ( feat. ) which isn't.. better really.. hmmm
I'm not sure how to fix the wording for this though
aaaaaaaa~



Repost because it got kind of buried before ahah
Mainly concerned about the confusion whitespace as it's currently written could cause

Idea wrote:

Trailing/leading whitespace is not required if the character next to it is the inner side of a bracket.
Example: Hello (CV: Goodbye) is okay, Hello( CV: Goodbye ) is not.
Lanturn
As for what Noffy wrote. I guess this was one of the things I was going to bring up in the next proposal for some guidelines. Anything that has some sort of opening and closing shouldn't require a space after the opener and before the closer.

*This* -that- <then> [thus] etc.

I also had something to address when it concerns romanizing from languages that don't typically use spaces like Japanese. It's something the metadata team has been pushing as of late when it comes to these languages. However, I'll probably save this for a later date since it's too late to be pushing this guideline through right now. The metadata team still recommends this though.

The guideline (in a nutshell) if anyone was interested
ジョジョ~その血の運命~ Archetype MIX Ver.
JoJo ~Sono Chi no Sadame~ Archetype MIX Ver.

when a symbol is alone and doesn't have a spacing, the romanization should have a whitespace before and after.(Ex. if the title was "ジョジョ~その" we'd use "JoJo ~ Sono" when romanizing)

When a symbol comes in pairs (like mentioned above), use a space before the first symbol and after the last symbol (Not needed if the symbol is the last character). (Ex. if the title was "ジョジョ~その血の運命~" we would use "JoJo ~Sono Chi no Sadame~"


-----------------------------------------------------------------

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
Shiguma

Lanturn wrote:

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
I believe it should be moved to metadata. Conflicts shouldn't really happen if we're using a common sense approach, the few samples you had laid out earlier seem like they would work for most cases. It is just a much better way to handle determining cut songs vs full songs, unless the length is obvious from the listing as Noffy said.
Monstrata
Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
Ulysses

Fycho wrote:

Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.

The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.
I wish not to fuel the fire of this longstanding debate which somehow has the resolution already. Yet, there is a flaw in this rule that I cannot resist to break my silence. It is not about 'v', 'yu', and 'yi', but that this rule does not take the existence of another language that adopts Chinese characters but whose pronunciation (and romanisation) are nothing like Mandarin Chinese into account. This language is Cantonese.

I will keep this post as short as possible.

Background

There are Cantonese songs in ranked maps, Kevin Cheng - Syut Ha Si -Thinking Under the Snow, Elanne Kwong - Sa Jiao, and JJ Lin - Jiang Nan (Cantonese Ver.), for instance.[1] Some of the titles are romanised in a wrong way. Elanne Kwong - Sa Jiao is a paradigm of errors of such kind. Its Cantonese romanisation is saat giu, not sa jiao. Whereas the former is the Cantonese and the correct romanisation of the title, the latter, Mandarin. The song is not a Mandarin song, but a Cantonese one.

In light of this, I propose that the rule be altered, below is the proposed wording of the rule after adjustment:

"Character-by-character Romanisation: [e]ach Chinese character must be Romanised [by] using Hanyu Pinyin system if the song is a Mandarin one; Jyutping if the song is a Cantonese one, and each romanised character must be capitalised and separated with a space."



Differences between Mandarin and Cantonese Romanisation

There are many ways wherein the two languages are different. From pronunciations and semantics of individual Chinese words to romanisation of each word. Since this is not a post about the teaching of the languages but the romanisation of the Cantonese language, the semantics needs not concern us. The other two will be discussed.

Nearly all Chinese characters are pronounced in a different way in Cantonese. Taking the word '我' (meaning 'I' as well as 'me' and sometimes, 'my') as an illustration, it is pronounced 'wo' in Mandarin but 'ngo' in Cantonese.

Not all Cantonese words have Mandarin romanisation. Par example, the word '冇' (meaning 'no' and 'nothing') has no Mandarin pinyin because it is non-existent in Mandarin. Its Jyutping is, however, 'mou'. Therefore, some Cantonese words cannot be romanised by using Hanyu Pinyin.


Arguments for the Proposed Change


1. Misrepresentation. Cantonese and Mandarin Chinese are very different in pronunciation (and meanings). Romanising Cantonese pinyin will result in misrepresentation.

2. Unable to search online. Searching the Cantonese song with the Romanised pinyin online will most likely get you a Mandarin version of the song (with lyrics changed, different pronunciations of each word and very often, different singers), effectively two different songs.

3. Wrongness. It is plainly wrong to romanise Cantonese by using pinyin. It is as if romanising Japanese kanji by using pinyin.


Romanisation of Cantonese


Jyutping does not contain any characters with accents.[2] Therefore, there will not be the slightest difficulty rendering the original Jyutping words into Romanised words.

Jyutping is easily obtainable. Mappers can easily obtain the Jyutping of individual words from the Chinese University of Hong Kong Dictionary[3].





I hope you can take this post into consideration.

[1] For more examples, see https://osu.ppy.sh/p/beatmaplist?q=cantonese
[2] For the list of consonants and vowels of Cantonese, see http://www.cantonese.sheik.co.uk/essays/jyutping.htm
[3] The website of the Chinese University of Hong Kong Dictionary: http://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/ ; to obtain the Jyutping of a word, type the word in question into the search box at the top-left corner of the page. Once searched, you can obtain the Jyutping from the first box on the left labelled 'syllable'.
Wafu
@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically. You could as well Romanise it using Cantonese pinyin, but you don't really say why that's bad here.

Anyway, the proposal was actually addressing this wording issue (we specified that when we talk about Chinese, we mean Mandarin, this draft doesn't say anything like that), but it seems like the error is back again. There shouldn't be "each Chinese character", but "each Mandarin character".
Skylish
The standard of romanization should be defined as: translating a language into Latin words. Under the rationale of transcription, the romanization of letters (in Latin letters) should be followed by the language. Dialects or different languages should be considered as independent language system since they have completely different pronunciation methods and systems.

I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese.

EDIT:

Another issue is the usage of Traditional Chinese and Simplified Chinese. Simply speaking, only Taiwan region, HKSAR and Macau use Traditional Chinese. Hence the metadata from these regions should be Traditional Chinese, no matter what languages are in the songs.
Ulysses

Wafu wrote:

@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically.
(This post is a continuation of my first post on page 12; the 179th post.)

You made some interesting points. However, I still believe that there is a need to Romanise Cantonese title by using Jyutping, not Pinyin. You arguments can be summarised as follow:

Wafu's Arguments


1. Cantonese is a minority language (on osu!). As such, we do not need to Romanise Cantonese.
2. To Romanise Cantonese the community needs to discuss which Jyutping is the most appropriate to use, therefore, in avoidance of this inconvenience, it is better for the community not to discuss.
3. The tones are to be taken into consideration, and it is troublesome. Thus, the community should not discuss it.

Rebuttals


Much as Wafu's arguments are interesting, they are no valid points. It appears to me plainly obvious that arguments (2) and (3) are fallacious. Simply because something is troublesome and inconvenient shall not bar the discussion of the community. In fact, most of the rules here on osu! are products after fierce debates. If we are to avoid any debate that may potentially arise, we better not to discuss any rule, and such a conclusion is, indeed, absurd.

Arguments (2) and (3) are also flawed in a way that they are not factually correct. For (2), Jyutping is the most widely used and the most authoritative. It is supported by a University dictionary and is the most accessible. Yales Cantonese Romanisation is also backed by Yales University, but it does not have an online dictionary and is not widely used at all. For (3), the tones in Mandarin are not taken into account when romanising, I see no reason one will be interested in the tones in the romanisation of Cantonese Jyutping.

Argument (1) is also wrong on several levels. On a factual level, it is unsound. Admittedly, Cantonese songs are no popular song type on osu!, nonetheless, there are more than 20 mapsthat are in Cantonese songs (note that the list I provided in my previous post is not exhaustive because not every Cantonese song has the word 'Cantonese' in tags), whilst there are 24 German songs ranked on osu!. If the premise is that the Romanisation of German titles is necessary, by analogy, I see no convincing reason why Cantonese should not be Romanised.

On a logical level, argument (1) is invalid. The premise 'Cantonese is a minority language' does not entail 'Cantonese does not need Romanisation', and moreover the implicit conclusion that 'Cantonese Romanisation shall be replaced by Mandarin Romanisation Pinyin'. It is one thing not to Romanise Cantonese, but it is quite another thing to Romanise Cantonese in a wrong way by using a different language system to Romanise it. Perhaps I am not clear enough. What I mean is that in the old days, where the rule that presently concerns us was yet in existence, mappers had the liberty to use Jyutping to Romanise Cantonese titles. However, once this rule is passed, all Cantonese titles will be in need to be Romanised by using the Mandarin Pinyin system. In other words, this rule in its state will coerce something that is wrong to happen. And that's why I suggest the change in question.

And Wafu mentioned that one could Romanise Cantonese by using Cantonese Pinying. This is a misconception. Cantonese Pinyin does not exist. The equivalent is Cantonese Jyutping and that is why I am proposing this change.
Topic Starter
Okoratu
idk why everyone here has to sound like they just had a dictionary for breakfast but whatever

to me the points raised by nold make sense, if there's words that cannot be romanised used pinyin then pinyin maybe shouldnt be used - as such the mandarin metadata ruling would need to specify what it applies to, i dunno if this is common enough to deserve its own ruling tho

edit: can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"
Fycho
We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.

If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.
Ulysses

Fycho wrote:

We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.

If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.

(This post is a continuation of my first post on page 12; the 179th post.)

(There are not only 3 ranked Cantonese songs. See my first post.)

I am delighted to learn that you have actually agreed that 'Cantonese metadata must be romanized as Cantonese pronunciation'. However, it is not reflected in the rule, and the rule is in fact, by its wording, against this proposition as it suggests that every Chinese character (including Chinese characters in Cantonese) are to be Romanised by using Pinyin.
Nonetheless, if the consensus is here, the only thing left is the wording of the rule. The rule's wording in its present state is not in compliance with what we earlier agreed. (And I have no intention whatsoever to argue against the adoption of other Cantonese Latinisation)

Therefore, shall the wording of the rule be:
"Character-by-character Romanisation: [e]ach Chinese character must be Romanised [by] using Hanyu Pinyin system [if the song is a Mandarin one; one of the standard Cantonese Romanisations to the mapper's discretion if the song is a Cantonese one], each first romanised character [of a Chinese word] must be capitalised and [the Romanised Chinese words are to be] separated with a space."

As to the tones, the romanisation can either ignore the Cantonese tones entirely, same as what the community decides to do for Mandarin pinyin, or the tones are to be added at the end of each Romanised word, for example, saat3 giu1, in accordance to all the Cantonese Romanisation standards. I am in favour of the former one because it is consistent with Madarin Romanisation.


Okoratu wrote:

can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"

Skylish is basically saying that

1. Romanisation should be based upon the language, not characters. Like French and English languages both use (roughly) the same alphabets, but they are different languages. If we were to translate them into another language, they should not follow to same format, we should not, (let's say to use the English pronunciation format) to translate French. It is the same for Mandarin and Cantonese. They share the same characters, but are different (argubly languages/dialects) which are mutually unintelligible.

2. Apart from the Mandarin and Cantonese division of the Chinese LANGUAGES, Chinese CHARATERS (not the same as Chinese languages) also have a division, which is Simplified and Traditional Chinese characters. Skylish is saying that if the song is from China, [Singapore, and Malaysia], Simplified Chinese characters are to be used; if from Hong Kong, Taiwan, and Macau, Traditional Chinese characters are to be used. This is because China, Singapore, and Malaysia use Simplified Chinese characters, whereas Hong Kong, Taiwan, and Macau, Traidtional.

The distinction between Chinese languages and Chinese characters may be very confusing to some people. But to simplify the matter, you can perceive Mandarin and Cantonese are like German and English; Simplified and Traditional Chinese are like a and à; ç and c, (but with more differences). The word ‘naïve’ is originally with two dots on top of ‘i’, but people are quite used to spell ‘naive’ instead of ‘naïve’. So ‘naïve’ is like Traditional Chinese, (in reality it would be something like ñâïvę or most of the time ñâvūrê)(these words don’t exist, I made them up); ‘naive’ without two dots is like simplified Chinese.
Wafu

nold_1702 wrote:

Wafu's Arguments


1. Cantonese is a minority language (on osu!). As such, we do not need to Romanise Cantonese.
2. To Romanise Cantonese the community needs to discuss which Jyutping is the most appropriate to use, therefore, in avoidance of this inconvenience, it is better for the community not to discuss.
3. The tones are to be taken into consideration, and it is troublesome. Thus, the community should not discuss it.

Okay. First of all, if you call someone's argument "fallacious", you're on a high wire and should be absolutely sure that you read the arguments properly. And that's how you simply flipped two of my arguments into something they are absolutely not supposed to mean.

1. Yes, I think that. And I think that despite the argument about German. The reason for that is because with German, you are replacing a couple of characters, that's it. You're not implementing a Romanisation system that needs quite some discussion.
2. I never said that you should not discuss it. I implied it wouldn't be worth to do, considering how many Cantonese maps are there. Yes, 20 maps is rare (and there are 44 German song, we need to consider this language just because we can choose "German" in search, it's implemented on the website). Most Cantonese maps are quite old, and I think that people who check metadata (e.g. metadata helpers) of qualified maps are able to see someone is Romanising a different language than they are supposed to. Never said you shouldn't discuss it, I said it would need discussion if we wanted to implement it, so that everyone agrees we can use Jyutping. (and as you've seen in this thread, it's not very easy to reach an agreement. Chinese was a big deal, not as much as Cantonese, that's why I don't think we should waste the time discussing another Romanisation system)
3. Quote me. I never said anything even close to that. I never said community shouldn't discuss it. I never said tones are to be taken into consideration, I never said it is troublesome. You suggested a system that doesn't have characters we can't type into "Romanised title". That's good, but Jyutping does use numbers for accents, which makes the Romanised text hard to understand for anyone who doesn't know Jyutping. The question is, ignore the numbers or keep them? Again, agreement from the community is needed.

nold_1702 wrote:

Arguments (2) and (3) are also flawed in a way that they are not factually correct. For (2), Jyutping is the most widely used and the most authoritative. It is supported by a University dictionary and is the most accessible. Yales Cantonese Romanisation is also backed by Yales University, but it does not have an online dictionary and is not widely used at all. For (3), the tones in Mandarin are not taken into account when romanising, I see no reason one will be interested in the tones in the romanisation of Cantonese Jyutping.

Yup, I agree with the first part about Jyutping and Yales. Never said anything against that. You, however, need community to agree on that. If we were about to implement a system that you and me would agree on, it would be pretty ignorant of the community. Regarding the second part, you can't just suggest a system that does use tones (which are the numbers) and act like they are not here. You didn't say if we just ignore the numbers or include them (which as I said, wouldn't be understandable for majority of people, that was my argument, not that tones are needed and shouldn't be discussed). If you don't say anything about a feature used in the system you suggested, I will assume you are for including it. Otherwise it would be hypocritical to suggest a system and then talk about its modified version. That's why I pointed out the numbers.

The next paragraph, as I said, German is officially added in the list of searchable languages, that's why we need to care about how we handle it. And there are 44 German songs according to it.

nold_1702 wrote:

It is one thing not to Romanise Cantonese, but it is quite another thing to Romanise Cantonese in a wrong way by using a different language system to Romanise it. Perhaps I am not clear enough. What I mean is that in the old days, where the rule that presently concerns us was yet in existence, mappers had the liberty to use Jyutping to Romanise Cantonese titles. However, once this rule is passed, all Cantonese titles will be in need to be Romanised by using the Mandarin Pinyin system.

As I said, the wording is incorrect, it should say "Mandarin", this is an error coming from the fact that someone forgot to replace it because the proposal about Chinese and Metadata overall were split. The wording will be fixed, obviously. Yes, it is important to differentiate Mandarin and Cantonese, and as I said, these days, people who check metadata of qualified maps are intelligent enough to check that you are using a Romanisation system for a correct language. Yes, current wording leads to mixing Cantonese and Mandarin, but that is an error that will be fixed. You still, in every situation should use a system designed for the language of the song.

Assuming that something just simply doesn't exist because it's not used in Hong Kong is quite funny, especially if you want to make someone look bad. There is an equivalent of Pinyin for Cantonese that is different from Jyutping.
Ulysses
Wafu my friend, I do not mean to 'make you look bad' or that I am doing anything against you, I am arguing against what you suggested, not yourself. Do not try to make things too personal as you are not in the spotlight, the issue is. Pardon me but I want to avoid starting another unnecessarily lengthy debate. It seems to me you are not quite focusing on the right thing (and I do not mean to be aggressive). See my bullet points below

- If you think community consensus is important, let the people discuss here, don't try to stop it.
- Mandarin is not an official language recognised by osu! either. Osu! recognises one big language group 'Chinese', which includes both Mandarin and Cantonese. If Mandarin is within the scope of discussion, so is Cantonese.
Topic Starter
Okoratu
i think no one ever questioned that romanisation should be based on the language lol

shall i change the current wording to mandarin while we figure the rest out?
Ulysses

Okoratu wrote:

i think no one ever questioned that romanisation should be based on the language lol

shall i change the current wording to mandarin while we figure the rest out?

Yes please
Topic Starter
Okoratu
ok done
can you debate the actual points instead of semantics then?
i dont have any power to nuke any posts here (thankfully for some retarded debates on the earlier pages) but please keep it short an concise

-> can someone sum up briefly where we stand currently on Cantonese? If we want to make people understand how it's pronounced / read then we'd need something different from the mandarin method anyways

anything that is reliable and produces readable text for people that are not cantonese should be good fwiw
Topic Starter
Okoratu

Monstrata wrote:

Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
uh not too sure id agree on covers that just do same voice but if it's a drastically different song id say you should be following the artist
show more
Please sign in to reply.

New reply