forum

[Proposal] Metadata section overhaul

posted
Total Posts
216
show more
Scarlet Evans
How should one romanize the title, if romanization, or even unambiguous phonetic transcription, doesn't exist, while several completely different ones are suggested to equally be such, while at the same time it would be completely, totally and absolutely wrong to choose just one of them as a title? I.e. it's not possible to transcribe in phonetic way (or romanize) the title, unless you include all of the several possible meanings? And even should one include all of them, should it be done by separating them by slash symbol, which in case of 4-5 of them would result in terribly long monstrosity king of a gore titles?

-----

Here's more of explanation:


Artists can name their songs however they want, right? They are not obliged to make it "possible" to pronounce or to have unambiguous title that someone can phonetise.

I don't know how it looks like in Chinese language, but a great deal of Japanese kanji characters can be pronounced (and romanized) not in one or two, but even several completely different ways!

I don't know, if someone made song like this, but I find nothing that could stop one from doing this, i.e. what if someone makes the song (which later someone decides to map in osu!), where:

[*] Title have no official romanization and remains "unspoken", i.e. aside of the title written in kanji, even the author never spells if and refers to the song only indirectly in words, unless by writing it or showing the kanji characters.

[*] It have several possible meanings, at least 3-4 of them, and of which ALL are suggested by the lyrics and ALL of them have COMPLETELY DIFFERENT romanization in regards to every single character used in the title. So, you can't phonetise it, as it have several, completely different phonetisations!!

[*] Lyrics of the song are used to maintain and express the ambiguity, creating a contradiction around what the title applies or refers to, making all of these "possible titles" to be "equal" on terms of being the possible title, while at the same time even contradicting each other, either by their contradictory meanings and/or by the lyrics, i.e. all of them are suggested to be the title, while at the same time being refuted from being so.

[*] It all have sense, while reading the kanji characters and extracting all these meaning, definitely making the title to fit into being the title of the song perfectly (!!!), with just one but: you can't possibly spell it or phonetise it, as it have several completely different pronounciations, as all of them can be equally considering as being and not-being (because of contradictory situation and being refuted by other meanings and/or lyrics) the title.

-----

So, each of these "possible titles" can be and can't (!) be the title at the same time, but all of them written down with the very same kanji characters are definitely the title, which deep meanings that are expressed in the lyrics.

-----

How should metadata for such song look like? :o

I don't know, if such song with "unspoken" title, where the title can have multiple meanings, currently exists, and if so, then to what extreme this ambiguity is brought, but there are titles with more than one meaning and sometimes you can't really find anywhere how the title should be pronounced or phonetised...

Also, looking at this discussion, even if such song doesn't exist, then depending on the answer I could decide to spend my time to improve my poor Japanese much better and eventually create one in future, maybe with some voluntary help of someone, just for the sake of trying to rank it in future :P

-----

But seriously... it's not required from an artist to have a title that is possible to be phonetised or romanized... what should a mapper do in such situation? Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?

And please, don't tell me that it's "impossible case" that's not worth considering, as I believe that the people in this community could definitely help to make it possible, if it will be required to.
Wafu

Scarlet Evans wrote:

Could such song be treated as an "exception" and allowed to have no romanization? Or all of several titles should be included in some ultimate title compilation?
Every song needs to be Romanised so that it can be searched normally. If it's your song, give it a name, if it's not, that probably can't be universally covered by RC. These situations (as they virtually don't happen) would probably be treated case by case.
Fycho
Adding additional Information seems redundant lol, if mappers/BN are unsure, they can ask Metadata QATs/Helpers for help like Modified Hepburn for Japanese.

CrystilonZ wrote:

泠鸢yousa - 没有名字的怪物 : Ling Yuan yousa - Shen De Sui Bo Zhu Liu
it has a mistake, should be:

没有名字的怪物 => Mei You Ming Zi De Guai Wu
神的随波逐流 => Shen De Sui Bo Zhu Liu
should delete "Word-by-word Romanisation" in Glossary.
Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted.

Considering there are only 5 ranked maps(really a few) that using "ü" until now, and neither "v" nor "yu" isn't the best choice unless we allow ü in romanised field. There will be a metadata discretion by the Metadata Team on specific cases.
CrystilonZ
Agree tbh half way through writing that I realised this is redundant af XDD
Wikipedia kinda provides too much but eeh whatever I guess
and somehow I accidentally copied the wrong title lmfao good job me
Wafu
@Fycho: We can't use the link provided due to this chapter.

Single meaning: Words with a single meaning, which are usually set up of two characters (sometimes one, seldom three), are written together and not capitalized: rén (人, person); péngyou (朋友, friend); qiǎokèlì (巧克力, chocolate)
etc.
This is what our draft suggested at first and what you argued against. I think this will just make it more confusing if you are supposed to ignore some sections of it.
Fycho
Adjusted the draft and simply removed it. I don't think there needs to be any redundant information like modified hepburn for japanese, and most Chinese metadata songs are made by Chinese speakers in the game, they know how pinyin works. Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painfully. Current rule has adequate explanation already.
Wafu

Fycho wrote:

Whoever is confused on how to romanize Chinese could ask metadata QATs/Helpers directly without looking through the wall texts in Wiki painly.
Yeah, I agree with that. Following the wiki would even make you Romanise it differently than what is intended by the rule.

As there was, finally, an agreement on ü, is it still in discussion or why is it missing in the draft?

For the Japanese Romanisation, I think that giving link to Hepburn wiki overall is not the best idea as multiple Hepburn systems are mixed here. What about just linking the Modified Hepburn document? I think this is the best reference as it only says the rules of the Romanisation without redundancy + it's from Library of Congress, which is probably the most official reference we can have.

Suggestion to make the Romanisation rules a bit shorter and cleaner

First of all, for Russian, change "must be romanised using..." to "must use" as it is with other languages.

All the Romanisation systems, consistently, say "when there is no romanisation or translation information listed by a reputable source" and "The same applies to the Source field if a romanised Source is preferred by the mapper."

Assuming there will still be Korean (it should be because we include it as a language selection on the website), it would be messy to have this in every rule. What about making these rules and removing them from the rules of specific languages:
  1. 1. official artist's Romanisations for all languages have priority
  2. 2. if you choose (and as of now, you can choose) to use Romanised source, use the same romanisation method
There's no reason to specify the same thing for every language if it works uniformly.
Topic Starter
Okoratu
Someone should give me the agreement on ü then

- will change the link
- will change russian wording,
- will avoid redundancy by just taking the parts that are redundant out and making them general rules for language specific romanisation works, right?

sth like
Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.
Wafu

Okoratu wrote:

Romanisation is only to be used when there is no official translation or preferred romanisation provided by the artist. This applies to all fields that can hold romanised data by intent.
Seems about right, so now the confirmation about ü, and Korean Romanisation.

Maybe a good reference for Korean (should be the Revised Romanization of Korean, which seems to be used in most maps) might be this.
CrystilonZ
ü is the matter of preference now lol everyone probably has enough information to choose a side but I don't expect a 100 percent consensus
My suggestion about that is
  1. Official translation and/or Romanisation must be used if able. This applies to all fields that can hold romanised data by intent. If there are multiple official translations and/or Romanisations, the mapper is free to choose any of them with the only exception being when there is a previously ranked mapset of that song. In such case the corresponding guideline applies to it. << feel like all of these are related and should be under one single rule. easier to read imo.
  2. If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria. << redundant. The only point this can conflict with is the naming convention stuff which is a specific case. adding stuff like "if necessary, ignore the preferred naming conventions of the artists" under that point is better.
  3. If a song or artist are referred to in multiple ways on official sources provided by the artist, the mapper is free to choose any of the romanisations. The only exception to this is if the song already has a mapset in the Ranked Section, in which case the corresponding guideline applies to it. << redundant also doesn't include translation orz

one more point that I wish to bring up is
If the artist field contains artist names with internally conflicting naming conventions (first name - last name and last name - first name formats), they must be normalized to just use the same format throughout.
^ should specify which one is preferred for consistency.
also naming conventions (stuff) --> the stuff inside parentheses should be in the glossary lol
lastly please add this to the glossary
Diacritical tone marks: ˉ, ˊ, ˇ, and ˋ above vowels in the pinyin system.
Topic Starter
Okoratu
the answer to your specification question is neither
i'll apply the rest tomorrow if i can find what the fukc this means
Lanturn
Hi. I'm late...

About TV Size:
If we do decide to keep the TV Size label, we should probably set standards for other common markers related to song length such as Short Ver. The whole reason TV Size metadata ended up like it is was due to a short ver map actually. By the sounds of the discussion, we've already decided to stop looking for TV Sizes on official sites and be consistent, which is a step in the right direction.

Personally, I'm siding with dropping it into the tags. As mentioned above, we should also consider dropping Short ver (and any others that describe the length of a song) as well since they're essentially the same as a TV Size being special cuts and all. I don't see why TV Size should be the only one with special treatment when the idea as a whole should be related to all song cuts.
Natsu
It would be really annoying without the label, for example when you map the TV Size and the Full version the mapsets merge together, Also if you're looking for a normal length song you are going to get a bunch of tv sizes or viceversa... and being honest people rarely care about tags, so dropping in to tags isn't going to work.

BTW what about using a single label for everything?
Wafu
Can absolutely agree with Lanturn on this.

@Natsu If you add "tv size" to the tags and then search for a map and append "tv size", you should get the one that is TV Size (in game, website will show you probably both in both scenarios). There was also a suggestion that "TV Size" or basically "Short version" could be as an option to search on the website (or maybe just the length). I think that unification should be the last resort, there are way too many ways how to resolve this issue. Just showing the time on the website would be enough. Especially when there are maps that are "TV Size", but are ~30 sec. That's probably not what most people look for, most people probably imagine the standard 1:30 songs. The mystification is present both if you include unified tag or you just put it into the tags. Except the latter can lead to solving this issue entirely, rather than placing a bandage on it.
Noffy
I'm of the opinion that displaying (TV Size) uniformly for all TV Size anime openings/endings would be much clearer to players searching maps of a song. It may take only seconds to check a single map to see what length it is, but what about when there's 5+ mapsets for a really popular song? Then you would have to check each individually to find what you're looking for instead of it being presented on the listing straightaway. People that have not seen TV Size in anime song titles in the past would not be likely to think to add it to their search terms. They'd probably be more likely to think something like, "theme song" or "opening" or "ending" or "short" if they're an English speaker, for instance. This can be seen by searching anime themes on YouTube and seeing how relatively rarely they use (TV Size) for their video titles.
However, if there were any way to have the length of a song displayed on a map's panel in the search listing, I'd totally agree with adding it to tags instead. But that's a website feature that would need to be considered separately...
And would also require begging peppy / flyte to add it

Also about what Lanturn brought up concerning making uniform tags for different types of length indicators like Short ver. Game ver. etc.: this sounds like a good idea, honestly. Then the main concern for metadata would be the accuracy of the actual main title, and not worrying about having to find out what particular variation of a label some obscure official source happens to use.
Topic Starter
Okoratu
it can work either way

we'd just need to set a standard and stick to it
Lanturn
Alright. So let's say we want to keep TV Size, Short Ver. etc. Let me attempt to write up a list. We should probably be discussing this in advance if we do end up keeping it. Since classifying types is another big job in itself.

TV Size -> (TV Size, Anime Ver, Opening Ver, Ending Ver, TV Edit... This list is endless)
This would be used in all OP/Endings and specific cuts used in the show. This includes non-japanese songs like cartoons, sitcoms, etc. (The Friends opening as an example would use TV Size)
Example: https://osu.ppy.sh/b/1533376 (Anime Version labeled as a TV Size on its official release)

Short Ver. / Extended Ver.-> (Every other Song, Visual Novel Opening/Endings)
A song that has had its song time cut officially from the original and doesn't meet the criteria of the other cuts. This also includes Visual Novels as they are mostly labeled with short versions over game size.

Game Size -> (Game Opening, Endings, Insert Songs, Some BGM tracks)
A specific cut when dealing with video games. (not including visual novels) Similar to TV Size, but when dealing with games. Example: https://www.youtube.com/watch?v=qAQUparDhtg
In addition. I'm wondering what we should do with Rhythm Game cuts.
Element of SPADA for example has a version made for the rhythm games and a full version. In the majority of cases, these songs are released without any sort of markers, but then this contradicts with the whole labeling based on version.


Short Cut / Extended Cut (Unofficial cuts/additions made to any song.)
Pretty self-explanatory. There's basically two options here: Add a marker to show that it's a specific cut, or use the original versions length (Cutting a Full ver to roughly the length of a TV Size would result in using the Full Ver title.)
Example: https://osu.ppy.sh/b/211503 (I cut this from the full version and it is different from the TV Size)

Full Ver. -> (The full official release.)
Drop the marker outright, this includes songs that are officially marked with (Full Ver.) in the title.

Nightcore, Speed Up Ver, and other edits that alter BPM will probably be left in the title field.

I'm sure I've missed quite a bit, but here's a start I guess.

Alternatively, there is the whole convert everything into Short Ver. / Extended Ver. that Natsu suggested.

A few topics for discussion from this if you missed them while reading: (The TL;DR)
Rhythm Video Game cuts. What label to add?
Should unofficial cuts be differentiated from official version releases?
Should unofficial cuts use the original title or the version they closely represent?
Markers I may have missed or any that should be removed.
Topic Starter
Okoratu
@Lanturn: i dont think we can be exhaustive with these lists fwiw

i'd just apply pareto principle to this and go with the 20% of work for 80% of cases and handle the rest via guidelines and establish new rulings as we go along?
Noffy

Noffy wrote:

ok time for a re-review with slightly fresher eyes

the thing part thirty wrote:

Guest mappers, storyboarders, and hitsounders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributors of any given beatmap set.

-> + "Skinners should be added if they made the skin specifically for the mapset" (in contrast to someone just borrowing/mixing skin elements that're already out there) (this would be nice)


the thing part forty two wrote:

Commas, vs., &, any variations of feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.

(CV: blah) vs. ( CV: blah ) . the latter would look silly, so CV: shouldn't require leading whitespace either. Or uhhh... this doesn't apply to sides which have the inside of a bracket next to them? or something. since it'd also apply to like, (feat.) vs. ( feat. ) which isn't.. better really.. hmmm
I'm not sure how to fix the wording for this though
aaaaaaaa~



Repost because it got kind of buried before ahah
Mainly concerned about the confusion whitespace as it's currently written could cause

Idea wrote:

Trailing/leading whitespace is not required if the character next to it is the inner side of a bracket.
Example: Hello (CV: Goodbye) is okay, Hello( CV: Goodbye ) is not.
Lanturn
As for what Noffy wrote. I guess this was one of the things I was going to bring up in the next proposal for some guidelines. Anything that has some sort of opening and closing shouldn't require a space after the opener and before the closer.

*This* -that- <then> [thus] etc.

I also had something to address when it concerns romanizing from languages that don't typically use spaces like Japanese. It's something the metadata team has been pushing as of late when it comes to these languages. However, I'll probably save this for a later date since it's too late to be pushing this guideline through right now. The metadata team still recommends this though.

The guideline (in a nutshell) if anyone was interested
ジョジョ~その血の運命~ Archetype MIX Ver.
JoJo ~Sono Chi no Sadame~ Archetype MIX Ver.

when a symbol is alone and doesn't have a spacing, the romanization should have a whitespace before and after.(Ex. if the title was "ジョジョ~その" we'd use "JoJo ~ Sono" when romanizing)

When a symbol comes in pairs (like mentioned above), use a space before the first symbol and after the last symbol (Not needed if the symbol is the last character). (Ex. if the title was "ジョジョ~その血の運命~" we would use "JoJo ~Sono Chi no Sadame~"


-----------------------------------------------------------------

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
Shiguma

Lanturn wrote:

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
I believe it should be moved to metadata. Conflicts shouldn't really happen if we're using a common sense approach, the few samples you had laid out earlier seem like they would work for most cases. It is just a much better way to handle determining cut songs vs full songs, unless the length is obvious from the listing as Noffy said.
Monstrata
Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
Ulysses

Fycho wrote:

Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.

The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.
I wish not to fuel the fire of this longstanding debate which somehow has the resolution already. Yet, there is a flaw in this rule that I cannot resist to break my silence. It is not about 'v', 'yu', and 'yi', but that this rule does not take the existence of another language that adopts Chinese characters but whose pronunciation (and romanisation) are nothing like Mandarin Chinese into account. This language is Cantonese.

I will keep this post as short as possible.

Background

There are Cantonese songs in ranked maps, Kevin Cheng - Syut Ha Si -Thinking Under the Snow, Elanne Kwong - Sa Jiao, and JJ Lin - Jiang Nan (Cantonese Ver.), for instance.[1] Some of the titles are romanised in a wrong way. Elanne Kwong - Sa Jiao is a paradigm of errors of such kind. Its Cantonese romanisation is saat giu, not sa jiao. Whereas the former is the Cantonese and the correct romanisation of the title, the latter, Mandarin. The song is not a Mandarin song, but a Cantonese one.

In light of this, I propose that the rule be altered, below is the proposed wording of the rule after adjustment:

"Character-by-character Romanisation: [e]ach Chinese character must be Romanised [by] using Hanyu Pinyin system if the song is a Mandarin one; Jyutping if the song is a Cantonese one, and each romanised character must be capitalised and separated with a space."



Differences between Mandarin and Cantonese Romanisation

There are many ways wherein the two languages are different. From pronunciations and semantics of individual Chinese words to romanisation of each word. Since this is not a post about the teaching of the languages but the romanisation of the Cantonese language, the semantics needs not concern us. The other two will be discussed.

Nearly all Chinese characters are pronounced in a different way in Cantonese. Taking the word '我' (meaning 'I' as well as 'me' and sometimes, 'my') as an illustration, it is pronounced 'wo' in Mandarin but 'ngo' in Cantonese.

Not all Cantonese words have Mandarin romanisation. Par example, the word '冇' (meaning 'no' and 'nothing') has no Mandarin pinyin because it is non-existent in Mandarin. Its Jyutping is, however, 'mou'. Therefore, some Cantonese words cannot be romanised by using Hanyu Pinyin.


Arguments for the Proposed Change


1. Misrepresentation. Cantonese and Mandarin Chinese are very different in pronunciation (and meanings). Romanising Cantonese pinyin will result in misrepresentation.

2. Unable to search online. Searching the Cantonese song with the Romanised pinyin online will most likely get you a Mandarin version of the song (with lyrics changed, different pronunciations of each word and very often, different singers), effectively two different songs.

3. Wrongness. It is plainly wrong to romanise Cantonese by using pinyin. It is as if romanising Japanese kanji by using pinyin.


Romanisation of Cantonese


Jyutping does not contain any characters with accents.[2] Therefore, there will not be the slightest difficulty rendering the original Jyutping words into Romanised words.

Jyutping is easily obtainable. Mappers can easily obtain the Jyutping of individual words from the Chinese University of Hong Kong Dictionary[3].





I hope you can take this post into consideration.

[1] For more examples, see https://osu.ppy.sh/p/beatmaplist?q=cantonese
[2] For the list of consonants and vowels of Cantonese, see http://www.cantonese.sheik.co.uk/essays/jyutping.htm
[3] The website of the Chinese University of Hong Kong Dictionary: http://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/ ; to obtain the Jyutping of a word, type the word in question into the search box at the top-left corner of the page. Once searched, you can obtain the Jyutping from the first box on the left labelled 'syllable'.
Wafu
@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically. You could as well Romanise it using Cantonese pinyin, but you don't really say why that's bad here.

Anyway, the proposal was actually addressing this wording issue (we specified that when we talk about Chinese, we mean Mandarin, this draft doesn't say anything like that), but it seems like the error is back again. There shouldn't be "each Chinese character", but "each Mandarin character".
Skylish
The standard of romanization should be defined as: translating a language into Latin words. Under the rationale of transcription, the romanization of letters (in Latin letters) should be followed by the language. Dialects or different languages should be considered as independent language system since they have completely different pronunciation methods and systems.

I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese.

EDIT:

Another issue is the usage of Traditional Chinese and Simplified Chinese. Simply speaking, only Taiwan region, HKSAR and Macau use Traditional Chinese. Hence the metadata from these regions should be Traditional Chinese, no matter what languages are in the songs.
Ulysses

Wafu wrote:

@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically.
(This post is a continuation of my first post on page 12; the 179th post.)

You made some interesting points. However, I still believe that there is a need to Romanise Cantonese title by using Jyutping, not Pinyin. You arguments can be summarised as follow:

Wafu's Arguments


1. Cantonese is a minority language (on osu!). As such, we do not need to Romanise Cantonese.
2. To Romanise Cantonese the community needs to discuss which Jyutping is the most appropriate to use, therefore, in avoidance of this inconvenience, it is better for the community not to discuss.
3. The tones are to be taken into consideration, and it is troublesome. Thus, the community should not discuss it.

Rebuttals


Much as Wafu's arguments are interesting, they are no valid points. It appears to me plainly obvious that arguments (2) and (3) are fallacious. Simply because something is troublesome and inconvenient shall not bar the discussion of the community. In fact, most of the rules here on osu! are products after fierce debates. If we are to avoid any debate that may potentially arise, we better not to discuss any rule, and such a conclusion is, indeed, absurd.

Arguments (2) and (3) are also flawed in a way that they are not factually correct. For (2), Jyutping is the most widely used and the most authoritative. It is supported by a University dictionary and is the most accessible. Yales Cantonese Romanisation is also backed by Yales University, but it does not have an online dictionary and is not widely used at all. For (3), the tones in Mandarin are not taken into account when romanising, I see no reason one will be interested in the tones in the romanisation of Cantonese Jyutping.

Argument (1) is also wrong on several levels. On a factual level, it is unsound. Admittedly, Cantonese songs are no popular song type on osu!, nonetheless, there are more than 20 mapsthat are in Cantonese songs (note that the list I provided in my previous post is not exhaustive because not every Cantonese song has the word 'Cantonese' in tags), whilst there are 24 German songs ranked on osu!. If the premise is that the Romanisation of German titles is necessary, by analogy, I see no convincing reason why Cantonese should not be Romanised.

On a logical level, argument (1) is invalid. The premise 'Cantonese is a minority language' does not entail 'Cantonese does not need Romanisation', and moreover the implicit conclusion that 'Cantonese Romanisation shall be replaced by Mandarin Romanisation Pinyin'. It is one thing not to Romanise Cantonese, but it is quite another thing to Romanise Cantonese in a wrong way by using a different language system to Romanise it. Perhaps I am not clear enough. What I mean is that in the old days, where the rule that presently concerns us was yet in existence, mappers had the liberty to use Jyutping to Romanise Cantonese titles. However, once this rule is passed, all Cantonese titles will be in need to be Romanised by using the Mandarin Pinyin system. In other words, this rule in its state will coerce something that is wrong to happen. And that's why I suggest the change in question.

And Wafu mentioned that one could Romanise Cantonese by using Cantonese Pinying. This is a misconception. Cantonese Pinyin does not exist. The equivalent is Cantonese Jyutping and that is why I am proposing this change.
Topic Starter
Okoratu
idk why everyone here has to sound like they just had a dictionary for breakfast but whatever

to me the points raised by nold make sense, if there's words that cannot be romanised used pinyin then pinyin maybe shouldnt be used - as such the mandarin metadata ruling would need to specify what it applies to, i dunno if this is common enough to deserve its own ruling tho

edit: can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"
Fycho
We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.

If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.
Ulysses

Fycho wrote:

We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.

If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.

(This post is a continuation of my first post on page 12; the 179th post.)

(There are not only 3 ranked Cantonese songs. See my first post.)

I am delighted to learn that you have actually agreed that 'Cantonese metadata must be romanized as Cantonese pronunciation'. However, it is not reflected in the rule, and the rule is in fact, by its wording, against this proposition as it suggests that every Chinese character (including Chinese characters in Cantonese) are to be Romanised by using Pinyin.
Nonetheless, if the consensus is here, the only thing left is the wording of the rule. The rule's wording in its present state is not in compliance with what we earlier agreed. (And I have no intention whatsoever to argue against the adoption of other Cantonese Latinisation)

Therefore, shall the wording of the rule be:
"Character-by-character Romanisation: [e]ach Chinese character must be Romanised [by] using Hanyu Pinyin system [if the song is a Mandarin one; one of the standard Cantonese Romanisations to the mapper's discretion if the song is a Cantonese one], each first romanised character [of a Chinese word] must be capitalised and [the Romanised Chinese words are to be] separated with a space."

As to the tones, the romanisation can either ignore the Cantonese tones entirely, same as what the community decides to do for Mandarin pinyin, or the tones are to be added at the end of each Romanised word, for example, saat3 giu1, in accordance to all the Cantonese Romanisation standards. I am in favour of the former one because it is consistent with Madarin Romanisation.


Okoratu wrote:

can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"

Skylish is basically saying that

1. Romanisation should be based upon the language, not characters. Like French and English languages both use (roughly) the same alphabets, but they are different languages. If we were to translate them into another language, they should not follow to same format, we should not, (let's say to use the English pronunciation format) to translate French. It is the same for Mandarin and Cantonese. They share the same characters, but are different (argubly languages/dialects) which are mutually unintelligible.

2. Apart from the Mandarin and Cantonese division of the Chinese LANGUAGES, Chinese CHARATERS (not the same as Chinese languages) also have a division, which is Simplified and Traditional Chinese characters. Skylish is saying that if the song is from China, [Singapore, and Malaysia], Simplified Chinese characters are to be used; if from Hong Kong, Taiwan, and Macau, Traditional Chinese characters are to be used. This is because China, Singapore, and Malaysia use Simplified Chinese characters, whereas Hong Kong, Taiwan, and Macau, Traidtional.

The distinction between Chinese languages and Chinese characters may be very confusing to some people. But to simplify the matter, you can perceive Mandarin and Cantonese are like German and English; Simplified and Traditional Chinese are like a and à; ç and c, (but with more differences). The word ‘naïve’ is originally with two dots on top of ‘i’, but people are quite used to spell ‘naive’ instead of ‘naïve’. So ‘naïve’ is like Traditional Chinese, (in reality it would be something like ñâïvę or most of the time ñâvūrê)(these words don’t exist, I made them up); ‘naive’ without two dots is like simplified Chinese.
Wafu

nold_1702 wrote:

Wafu's Arguments


1. Cantonese is a minority language (on osu!). As such, we do not need to Romanise Cantonese.
2. To Romanise Cantonese the community needs to discuss which Jyutping is the most appropriate to use, therefore, in avoidance of this inconvenience, it is better for the community not to discuss.
3. The tones are to be taken into consideration, and it is troublesome. Thus, the community should not discuss it.

Okay. First of all, if you call someone's argument "fallacious", you're on a high wire and should be absolutely sure that you read the arguments properly. And that's how you simply flipped two of my arguments into something they are absolutely not supposed to mean.

1. Yes, I think that. And I think that despite the argument about German. The reason for that is because with German, you are replacing a couple of characters, that's it. You're not implementing a Romanisation system that needs quite some discussion.
2. I never said that you should not discuss it. I implied it wouldn't be worth to do, considering how many Cantonese maps are there. Yes, 20 maps is rare (and there are 44 German song, we need to consider this language just because we can choose "German" in search, it's implemented on the website). Most Cantonese maps are quite old, and I think that people who check metadata (e.g. metadata helpers) of qualified maps are able to see someone is Romanising a different language than they are supposed to. Never said you shouldn't discuss it, I said it would need discussion if we wanted to implement it, so that everyone agrees we can use Jyutping. (and as you've seen in this thread, it's not very easy to reach an agreement. Chinese was a big deal, not as much as Cantonese, that's why I don't think we should waste the time discussing another Romanisation system)
3. Quote me. I never said anything even close to that. I never said community shouldn't discuss it. I never said tones are to be taken into consideration, I never said it is troublesome. You suggested a system that doesn't have characters we can't type into "Romanised title". That's good, but Jyutping does use numbers for accents, which makes the Romanised text hard to understand for anyone who doesn't know Jyutping. The question is, ignore the numbers or keep them? Again, agreement from the community is needed.

nold_1702 wrote:

Arguments (2) and (3) are also flawed in a way that they are not factually correct. For (2), Jyutping is the most widely used and the most authoritative. It is supported by a University dictionary and is the most accessible. Yales Cantonese Romanisation is also backed by Yales University, but it does not have an online dictionary and is not widely used at all. For (3), the tones in Mandarin are not taken into account when romanising, I see no reason one will be interested in the tones in the romanisation of Cantonese Jyutping.

Yup, I agree with the first part about Jyutping and Yales. Never said anything against that. You, however, need community to agree on that. If we were about to implement a system that you and me would agree on, it would be pretty ignorant of the community. Regarding the second part, you can't just suggest a system that does use tones (which are the numbers) and act like they are not here. You didn't say if we just ignore the numbers or include them (which as I said, wouldn't be understandable for majority of people, that was my argument, not that tones are needed and shouldn't be discussed). If you don't say anything about a feature used in the system you suggested, I will assume you are for including it. Otherwise it would be hypocritical to suggest a system and then talk about its modified version. That's why I pointed out the numbers.

The next paragraph, as I said, German is officially added in the list of searchable languages, that's why we need to care about how we handle it. And there are 44 German songs according to it.

nold_1702 wrote:

It is one thing not to Romanise Cantonese, but it is quite another thing to Romanise Cantonese in a wrong way by using a different language system to Romanise it. Perhaps I am not clear enough. What I mean is that in the old days, where the rule that presently concerns us was yet in existence, mappers had the liberty to use Jyutping to Romanise Cantonese titles. However, once this rule is passed, all Cantonese titles will be in need to be Romanised by using the Mandarin Pinyin system.

As I said, the wording is incorrect, it should say "Mandarin", this is an error coming from the fact that someone forgot to replace it because the proposal about Chinese and Metadata overall were split. The wording will be fixed, obviously. Yes, it is important to differentiate Mandarin and Cantonese, and as I said, these days, people who check metadata of qualified maps are intelligent enough to check that you are using a Romanisation system for a correct language. Yes, current wording leads to mixing Cantonese and Mandarin, but that is an error that will be fixed. You still, in every situation should use a system designed for the language of the song.

Assuming that something just simply doesn't exist because it's not used in Hong Kong is quite funny, especially if you want to make someone look bad. There is an equivalent of Pinyin for Cantonese that is different from Jyutping.
Ulysses
Wafu my friend, I do not mean to 'make you look bad' or that I am doing anything against you, I am arguing against what you suggested, not yourself. Do not try to make things too personal as you are not in the spotlight, the issue is. Pardon me but I want to avoid starting another unnecessarily lengthy debate. It seems to me you are not quite focusing on the right thing (and I do not mean to be aggressive). See my bullet points below

- If you think community consensus is important, let the people discuss here, don't try to stop it.
- Mandarin is not an official language recognised by osu! either. Osu! recognises one big language group 'Chinese', which includes both Mandarin and Cantonese. If Mandarin is within the scope of discussion, so is Cantonese.
Topic Starter
Okoratu
i think no one ever questioned that romanisation should be based on the language lol

shall i change the current wording to mandarin while we figure the rest out?
Ulysses

Okoratu wrote:

i think no one ever questioned that romanisation should be based on the language lol

shall i change the current wording to mandarin while we figure the rest out?

Yes please
Topic Starter
Okoratu
ok done
can you debate the actual points instead of semantics then?
i dont have any power to nuke any posts here (thankfully for some retarded debates on the earlier pages) but please keep it short an concise

-> can someone sum up briefly where we stand currently on Cantonese? If we want to make people understand how it's pronounced / read then we'd need something different from the mandarin method anyways

anything that is reliable and produces readable text for people that are not cantonese should be good fwiw
Topic Starter
Okoratu

Monstrata wrote:

Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
uh not too sure id agree on covers that just do same voice but if it's a drastically different song id say you should be following the artist
xxdeathx

Natsu wrote:

It would be really annoying without the label, for example when you map the TV Size and the Full version the mapsets merge together, Also if you're looking for a normal length song you are going to get a bunch of tv sizes or viceversa... and being honest people rarely care about tags, so dropping in to tags isn't going to work.
strongly agree :) :)
Fycho
Completing the cantonese part, nold_1702 and me re-wording the proposal about Chinese and Cantonese stuffs a bit.
We think Chinese / standard Chinese / Written vernacular Chinese are actually towards the same thing, and Mandarin is kind a tone of them that isn't a language. So we just use the Chinese back.

Glossary
Character-by-character Romanisation: each Chinese character must be romanised as a capitalised word and separated with a space.

Rules
Songs with Chinese metadata must be romanised in accordance with the Character-by-character method by using Hanyu Pinyin system in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Songs with Cantonese metadata must be romanised by using Jyutping system.
Ulysses
As a speaker of the two languages, I struggle to understand why you keep emphasising that Mandarin and Cantonese are just different tones. Because they are languages that use Chinese characters not just different tones. But whatever, it is not something that we have to discuss here. (For some of you who may be interested, here is a fun and short youtube video explaining the differences and similarities between the two: https://youtu.be/s2km_z4-1T8 )


Anyway, I modified the grammar and changed some words to more precise ones:

Glossary
Character-by-character Romanisation: each Chinese character must be Romanised as a capitalised word and separated with a space.


Rules
Songs with metadata in Chinese must be Romanised in accordance with the Character-by-character method by using Hanyu Pinyin system in Romanised fields if there is no Romanisation or translation information listed in a credible source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Songs with Cantonese metadata must be Romanised by using Jyutping system.
Topic Starter
Okoratu
REmoved the part where it talked about unavailability of preferred romanisation: "If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria." handles that.

Reverted Glossary

Decluttered the rule into the following statements:
  1. Songs with Chinese metadata are to be handled with respect to the tones and dialects of Chinese they belong to. In any case, al diacritical tone marks must be omitted:
    1. Mandarin metadata must be romanised using the character-by-character method.
    2. Cantonese metadata must be romanised using the Jyutping system.
    3. If the song falls into neither category, this choice is left up to the mapper's discretion


i hope this is more clear and captures the spirit of what you wanted to say while being more straightforward to digest

ToDo:
- Spacing of special characters retarded loopholes fixing
- !where Korea
- common markers rules
- CrystilionZ point needs to be applied but idk how

If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria.
nope, this refers to special characters and title formatting rulings above you cant stuff that into one thing
how is a translation not officially referring to the song in multiple ways?
Ulysses
Hmmm you missed the Hanyu pinyin thing in the Mandarin part and the character-by-character part in the Cantonese part.

So both of them use the character by character method.
Mandarin uses Hanyu pinyin system to romanise chinese charaters
Whereas Cantonese uses Jyutping system to romanise chinese characters

So:

Mandarin metadata must be romanised usong the Hanyu Pinyin system.

Cantonese metadata must be romanised using the Jyutping system.

In both cases, the character-by-charatcer method is to be adopted.
show more
Please sign in to reply.

New reply