forum

[Proposal] Metadata section overhaul

posted
Total Posts
216
show more
Wafu
Can absolutely agree with Lanturn on this.

@Natsu If you add "tv size" to the tags and then search for a map and append "tv size", you should get the one that is TV Size (in game, website will show you probably both in both scenarios). There was also a suggestion that "TV Size" or basically "Short version" could be as an option to search on the website (or maybe just the length). I think that unification should be the last resort, there are way too many ways how to resolve this issue. Just showing the time on the website would be enough. Especially when there are maps that are "TV Size", but are ~30 sec. That's probably not what most people look for, most people probably imagine the standard 1:30 songs. The mystification is present both if you include unified tag or you just put it into the tags. Except the latter can lead to solving this issue entirely, rather than placing a bandage on it.
Noffy
I'm of the opinion that displaying (TV Size) uniformly for all TV Size anime openings/endings would be much clearer to players searching maps of a song. It may take only seconds to check a single map to see what length it is, but what about when there's 5+ mapsets for a really popular song? Then you would have to check each individually to find what you're looking for instead of it being presented on the listing straightaway. People that have not seen TV Size in anime song titles in the past would not be likely to think to add it to their search terms. They'd probably be more likely to think something like, "theme song" or "opening" or "ending" or "short" if they're an English speaker, for instance. This can be seen by searching anime themes on YouTube and seeing how relatively rarely they use (TV Size) for their video titles.
However, if there were any way to have the length of a song displayed on a map's panel in the search listing, I'd totally agree with adding it to tags instead. But that's a website feature that would need to be considered separately...
And would also require begging peppy / flyte to add it

Also about what Lanturn brought up concerning making uniform tags for different types of length indicators like Short ver. Game ver. etc.: this sounds like a good idea, honestly. Then the main concern for metadata would be the accuracy of the actual main title, and not worrying about having to find out what particular variation of a label some obscure official source happens to use.
Topic Starter
Okoratu
it can work either way

we'd just need to set a standard and stick to it
Lanturn
Alright. So let's say we want to keep TV Size, Short Ver. etc. Let me attempt to write up a list. We should probably be discussing this in advance if we do end up keeping it. Since classifying types is another big job in itself.

TV Size -> (TV Size, Anime Ver, Opening Ver, Ending Ver, TV Edit... This list is endless)
This would be used in all OP/Endings and specific cuts used in the show. This includes non-japanese songs like cartoons, sitcoms, etc. (The Friends opening as an example would use TV Size)
Example: https://osu.ppy.sh/b/1533376 (Anime Version labeled as a TV Size on its official release)

Short Ver. / Extended Ver.-> (Every other Song, Visual Novel Opening/Endings)
A song that has had its song time cut officially from the original and doesn't meet the criteria of the other cuts. This also includes Visual Novels as they are mostly labeled with short versions over game size.

Game Size -> (Game Opening, Endings, Insert Songs, Some BGM tracks)
A specific cut when dealing with video games. (not including visual novels) Similar to TV Size, but when dealing with games. Example: https://www.youtube.com/watch?v=qAQUparDhtg
In addition. I'm wondering what we should do with Rhythm Game cuts.
Element of SPADA for example has a version made for the rhythm games and a full version. In the majority of cases, these songs are released without any sort of markers, but then this contradicts with the whole labeling based on version.


Short Cut / Extended Cut (Unofficial cuts/additions made to any song.)
Pretty self-explanatory. There's basically two options here: Add a marker to show that it's a specific cut, or use the original versions length (Cutting a Full ver to roughly the length of a TV Size would result in using the Full Ver title.)
Example: https://osu.ppy.sh/b/211503 (I cut this from the full version and it is different from the TV Size)

Full Ver. -> (The full official release.)
Drop the marker outright, this includes songs that are officially marked with (Full Ver.) in the title.

Nightcore, Speed Up Ver, and other edits that alter BPM will probably be left in the title field.

I'm sure I've missed quite a bit, but here's a start I guess.

Alternatively, there is the whole convert everything into Short Ver. / Extended Ver. that Natsu suggested.

A few topics for discussion from this if you missed them while reading: (The TL;DR)
Rhythm Video Game cuts. What label to add?
Should unofficial cuts be differentiated from official version releases?
Should unofficial cuts use the original title or the version they closely represent?
Markers I may have missed or any that should be removed.
Topic Starter
Okoratu
@Lanturn: i dont think we can be exhaustive with these lists fwiw

i'd just apply pareto principle to this and go with the 20% of work for 80% of cases and handle the rest via guidelines and establish new rulings as we go along?
Noffy

Noffy wrote:

ok time for a re-review with slightly fresher eyes

the thing part thirty wrote:

Guest mappers, storyboarders, and hitsounders must be added to the tags of a beatmap set. This is to give credit where credit is due and helping others identify the main contributors of any given beatmap set.

-> + "Skinners should be added if they made the skin specifically for the mapset" (in contrast to someone just borrowing/mixing skin elements that're already out there) (this would be nice)


the thing part forty two wrote:

Commas, vs., &, any variations of feat./ft., CV: must always use a trailing whitespace. Unless it is a comma, leading whitespace is also required.

(CV: blah) vs. ( CV: blah ) . the latter would look silly, so CV: shouldn't require leading whitespace either. Or uhhh... this doesn't apply to sides which have the inside of a bracket next to them? or something. since it'd also apply to like, (feat.) vs. ( feat. ) which isn't.. better really.. hmmm
I'm not sure how to fix the wording for this though
aaaaaaaa~



Repost because it got kind of buried before ahah
Mainly concerned about the confusion whitespace as it's currently written could cause

Idea wrote:

Trailing/leading whitespace is not required if the character next to it is the inner side of a bracket.
Example: Hello (CV: Goodbye) is okay, Hello( CV: Goodbye ) is not.
Lanturn
As for what Noffy wrote. I guess this was one of the things I was going to bring up in the next proposal for some guidelines. Anything that has some sort of opening and closing shouldn't require a space after the opener and before the closer.

*This* -that- <then> [thus] etc.

I also had something to address when it concerns romanizing from languages that don't typically use spaces like Japanese. It's something the metadata team has been pushing as of late when it comes to these languages. However, I'll probably save this for a later date since it's too late to be pushing this guideline through right now. The metadata team still recommends this though.

The guideline (in a nutshell) if anyone was interested
ジョジョ~その血の運命~ Archetype MIX Ver.
JoJo ~Sono Chi no Sadame~ Archetype MIX Ver.

when a symbol is alone and doesn't have a spacing, the romanization should have a whitespace before and after.(Ex. if the title was "ジョジョ~その" we'd use "JoJo ~ Sono" when romanizing)

When a symbol comes in pairs (like mentioned above), use a space before the first symbol and after the last symbol (Not needed if the symbol is the last character). (Ex. if the title was "ジョジョ~その血の運命~" we would use "JoJo ~Sono Chi no Sadame~"


-----------------------------------------------------------------

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
Shiguma

Lanturn wrote:

Anyways. TV Size time. So basically we'll just apply common sense for these and use whatever marker matches closest to our own preferences. Pretty much going back to our old school methods this way.

I mean we can work with that but it may cause some DQs when it comes to preference or conflicts down the line.

For me, I'd rather push towards something more concrete with the smallest room for error, which is moving them to the tags. I am fine with working with either method though.

So uh. Pick one I guess and we'll move along with it from there.
I believe it should be moved to metadata. Conflicts shouldn't really happen if we're using a common sense approach, the few samples you had laid out earlier seem like they would work for most cases. It is just a much better way to handle determining cut songs vs full songs, unless the length is obvious from the listing as Noffy said.
Monstrata
Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
Ulysses

Fycho wrote:

Glossary
Character-by-character Romanisation: Each Chinese character must be Romanised using Hanyu Pinyin system, and each romanised character must be capitalised and separated with a space.

Rules
Songs with Chinese metadata must be Romanised using the Character-by-character method in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Refer to Thread: Romanisation of Chinese for more information.

The discussion of "Romanisation of Chinese" should be adequate and stopped now. Anyone has concerns are free to contact me for detail explanations.
I wish not to fuel the fire of this longstanding debate which somehow has the resolution already. Yet, there is a flaw in this rule that I cannot resist to break my silence. It is not about 'v', 'yu', and 'yi', but that this rule does not take the existence of another language that adopts Chinese characters but whose pronunciation (and romanisation) are nothing like Mandarin Chinese into account. This language is Cantonese.

I will keep this post as short as possible.

Background

There are Cantonese songs in ranked maps, Kevin Cheng - Syut Ha Si -Thinking Under the Snow, Elanne Kwong - Sa Jiao, and JJ Lin - Jiang Nan (Cantonese Ver.), for instance.[1] Some of the titles are romanised in a wrong way. Elanne Kwong - Sa Jiao is a paradigm of errors of such kind. Its Cantonese romanisation is saat giu, not sa jiao. Whereas the former is the Cantonese and the correct romanisation of the title, the latter, Mandarin. The song is not a Mandarin song, but a Cantonese one.

In light of this, I propose that the rule be altered, below is the proposed wording of the rule after adjustment:

"Character-by-character Romanisation: [e]ach Chinese character must be Romanised [by] using Hanyu Pinyin system if the song is a Mandarin one; Jyutping if the song is a Cantonese one, and each romanised character must be capitalised and separated with a space."



Differences between Mandarin and Cantonese Romanisation

There are many ways wherein the two languages are different. From pronunciations and semantics of individual Chinese words to romanisation of each word. Since this is not a post about the teaching of the languages but the romanisation of the Cantonese language, the semantics needs not concern us. The other two will be discussed.

Nearly all Chinese characters are pronounced in a different way in Cantonese. Taking the word '我' (meaning 'I' as well as 'me' and sometimes, 'my') as an illustration, it is pronounced 'wo' in Mandarin but 'ngo' in Cantonese.

Not all Cantonese words have Mandarin romanisation. Par example, the word '冇' (meaning 'no' and 'nothing') has no Mandarin pinyin because it is non-existent in Mandarin. Its Jyutping is, however, 'mou'. Therefore, some Cantonese words cannot be romanised by using Hanyu Pinyin.


Arguments for the Proposed Change


1. Misrepresentation. Cantonese and Mandarin Chinese are very different in pronunciation (and meanings). Romanising Cantonese pinyin will result in misrepresentation.

2. Unable to search online. Searching the Cantonese song with the Romanised pinyin online will most likely get you a Mandarin version of the song (with lyrics changed, different pronunciations of each word and very often, different singers), effectively two different songs.

3. Wrongness. It is plainly wrong to romanise Cantonese by using pinyin. It is as if romanising Japanese kanji by using pinyin.


Romanisation of Cantonese


Jyutping does not contain any characters with accents.[2] Therefore, there will not be the slightest difficulty rendering the original Jyutping words into Romanised words.

Jyutping is easily obtainable. Mappers can easily obtain the Jyutping of individual words from the Chinese University of Hong Kong Dictionary[3].





I hope you can take this post into consideration.

[1] For more examples, see https://osu.ppy.sh/p/beatmaplist?q=cantonese
[2] For the list of consonants and vowels of Cantonese, see http://www.cantonese.sheik.co.uk/essays/jyutping.htm
[3] The website of the Chinese University of Hong Kong Dictionary: http://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/ ; to obtain the Jyutping of a word, type the word in question into the search box at the top-left corner of the page. Once searched, you can obtain the Jyutping from the first box on the left labelled 'syllable'.
Wafu
@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically. You could as well Romanise it using Cantonese pinyin, but you don't really say why that's bad here.

Anyway, the proposal was actually addressing this wording issue (we specified that when we talk about Chinese, we mean Mandarin, this draft doesn't say anything like that), but it seems like the error is back again. There shouldn't be "each Chinese character", but "each Mandarin character".
Skylish
The standard of romanization should be defined as: translating a language into Latin words. Under the rationale of transcription, the romanization of letters (in Latin letters) should be followed by the language. Dialects or different languages should be considered as independent language system since they have completely different pronunciation methods and systems.

I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese.

EDIT:

Another issue is the usage of Traditional Chinese and Simplified Chinese. Simply speaking, only Taiwan region, HKSAR and Macau use Traditional Chinese. Hence the metadata from these regions should be Traditional Chinese, no matter what languages are in the songs.
Ulysses

Wafu wrote:

@nold_1702: I don't really think that such a minority language (in osu!, it is a rarity in beatmaps) needs a specification. Simply because you then need to discuss whether Jyutping is the most appropriate to use. Although its Romanisation doesn't necessarily contain "characters with accents", it does contain numbers that define the accent, which is not understandable by anyone who doesn't know Jyutping specifically.
(This post is a continuation of my first post on page 12; the 179th post.)

You made some interesting points. However, I still believe that there is a need to Romanise Cantonese title by using Jyutping, not Pinyin. You arguments can be summarised as follow:

Wafu's Arguments


1. Cantonese is a minority language (on osu!). As such, we do not need to Romanise Cantonese.
2. To Romanise Cantonese the community needs to discuss which Jyutping is the most appropriate to use, therefore, in avoidance of this inconvenience, it is better for the community not to discuss.
3. The tones are to be taken into consideration, and it is troublesome. Thus, the community should not discuss it.

Rebuttals


Much as Wafu's arguments are interesting, they are no valid points. It appears to me plainly obvious that arguments (2) and (3) are fallacious. Simply because something is troublesome and inconvenient shall not bar the discussion of the community. In fact, most of the rules here on osu! are products after fierce debates. If we are to avoid any debate that may potentially arise, we better not to discuss any rule, and such a conclusion is, indeed, absurd.

Arguments (2) and (3) are also flawed in a way that they are not factually correct. For (2), Jyutping is the most widely used and the most authoritative. It is supported by a University dictionary and is the most accessible. Yales Cantonese Romanisation is also backed by Yales University, but it does not have an online dictionary and is not widely used at all. For (3), the tones in Mandarin are not taken into account when romanising, I see no reason one will be interested in the tones in the romanisation of Cantonese Jyutping.

Argument (1) is also wrong on several levels. On a factual level, it is unsound. Admittedly, Cantonese songs are no popular song type on osu!, nonetheless, there are more than 20 mapsthat are in Cantonese songs (note that the list I provided in my previous post is not exhaustive because not every Cantonese song has the word 'Cantonese' in tags), whilst there are 24 German songs ranked on osu!. If the premise is that the Romanisation of German titles is necessary, by analogy, I see no convincing reason why Cantonese should not be Romanised.

On a logical level, argument (1) is invalid. The premise 'Cantonese is a minority language' does not entail 'Cantonese does not need Romanisation', and moreover the implicit conclusion that 'Cantonese Romanisation shall be replaced by Mandarin Romanisation Pinyin'. It is one thing not to Romanise Cantonese, but it is quite another thing to Romanise Cantonese in a wrong way by using a different language system to Romanise it. Perhaps I am not clear enough. What I mean is that in the old days, where the rule that presently concerns us was yet in existence, mappers had the liberty to use Jyutping to Romanise Cantonese titles. However, once this rule is passed, all Cantonese titles will be in need to be Romanised by using the Mandarin Pinyin system. In other words, this rule in its state will coerce something that is wrong to happen. And that's why I suggest the change in question.

And Wafu mentioned that one could Romanise Cantonese by using Cantonese Pinying. This is a misconception. Cantonese Pinyin does not exist. The equivalent is Cantonese Jyutping and that is why I am proposing this change.
Topic Starter
Okoratu
idk why everyone here has to sound like they just had a dictionary for breakfast but whatever

to me the points raised by nold make sense, if there's words that cannot be romanised used pinyin then pinyin maybe shouldnt be used - as such the mandarin metadata ruling would need to specify what it applies to, i dunno if this is common enough to deserve its own ruling tho

edit: can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"
Fycho
We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.

If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.
Ulysses

Fycho wrote:

We have agreed that “Cantonese metadata must be romanized as Cantonese pronunciation”. But we haven’t figured out which romanization way is better and should be used, so I didn’t add it to the proposal. Considering there are only less three ranked songs, we can stil apply a metadata discretion for them.

If you really want to discuss and figure out, also don’t forget Cantonese differs internally and has different tones in itself, I don’t know if forcing jyupin would be a solution and may have potentional issues with other cantonese speaker areas like Guangzhou. I am saying a metadata discretion by Mapper would be the best for them.

(This post is a continuation of my first post on page 12; the 179th post.)

(There are not only 3 ranked Cantonese songs. See my first post.)

I am delighted to learn that you have actually agreed that 'Cantonese metadata must be romanized as Cantonese pronunciation'. However, it is not reflected in the rule, and the rule is in fact, by its wording, against this proposition as it suggests that every Chinese character (including Chinese characters in Cantonese) are to be Romanised by using Pinyin.
Nonetheless, if the consensus is here, the only thing left is the wording of the rule. The rule's wording in its present state is not in compliance with what we earlier agreed. (And I have no intention whatsoever to argue against the adoption of other Cantonese Latinisation)

Therefore, shall the wording of the rule be:
"Character-by-character Romanisation: [e]ach Chinese character must be Romanised [by] using Hanyu Pinyin system [if the song is a Mandarin one; one of the standard Cantonese Romanisations to the mapper's discretion if the song is a Cantonese one], each first romanised character [of a Chinese word] must be capitalised and [the Romanised Chinese words are to be] separated with a space."

As to the tones, the romanisation can either ignore the Cantonese tones entirely, same as what the community decides to do for Mandarin pinyin, or the tones are to be added at the end of each Romanised word, for example, saat3 giu1, in accordance to all the Cantonese Romanisation standards. I am in favour of the former one because it is consistent with Madarin Romanisation.


Okoratu wrote:

can someone translate Skylish's post into readable english for me? I've read it multiple times and dunno what he's trying to get at for half of it like the only sentence that makes sense is "I object a completely 100% Hanyu Pinyin forced on other Chinese languages excluding Mandarin under the PRC standardized Chinese." with reasoning "they're different enough to class as different languages"

Skylish is basically saying that

1. Romanisation should be based upon the language, not characters. Like French and English languages both use (roughly) the same alphabets, but they are different languages. If we were to translate them into another language, they should not follow to same format, we should not, (let's say to use the English pronunciation format) to translate French. It is the same for Mandarin and Cantonese. They share the same characters, but are different (argubly languages/dialects) which are mutually unintelligible.

2. Apart from the Mandarin and Cantonese division of the Chinese LANGUAGES, Chinese CHARATERS (not the same as Chinese languages) also have a division, which is Simplified and Traditional Chinese characters. Skylish is saying that if the song is from China, [Singapore, and Malaysia], Simplified Chinese characters are to be used; if from Hong Kong, Taiwan, and Macau, Traditional Chinese characters are to be used. This is because China, Singapore, and Malaysia use Simplified Chinese characters, whereas Hong Kong, Taiwan, and Macau, Traidtional.

The distinction between Chinese languages and Chinese characters may be very confusing to some people. But to simplify the matter, you can perceive Mandarin and Cantonese are like German and English; Simplified and Traditional Chinese are like a and à; ç and c, (but with more differences). The word ‘naïve’ is originally with two dots on top of ‘i’, but people are quite used to spell ‘naive’ instead of ‘naïve’. So ‘naïve’ is like Traditional Chinese, (in reality it would be something like ñâïvę or most of the time ñâvūrê)(these words don’t exist, I made them up); ‘naive’ without two dots is like simplified Chinese.
Wafu

nold_1702 wrote:

Wafu's Arguments


1. Cantonese is a minority language (on osu!). As such, we do not need to Romanise Cantonese.
2. To Romanise Cantonese the community needs to discuss which Jyutping is the most appropriate to use, therefore, in avoidance of this inconvenience, it is better for the community not to discuss.
3. The tones are to be taken into consideration, and it is troublesome. Thus, the community should not discuss it.

Okay. First of all, if you call someone's argument "fallacious", you're on a high wire and should be absolutely sure that you read the arguments properly. And that's how you simply flipped two of my arguments into something they are absolutely not supposed to mean.

1. Yes, I think that. And I think that despite the argument about German. The reason for that is because with German, you are replacing a couple of characters, that's it. You're not implementing a Romanisation system that needs quite some discussion.
2. I never said that you should not discuss it. I implied it wouldn't be worth to do, considering how many Cantonese maps are there. Yes, 20 maps is rare (and there are 44 German song, we need to consider this language just because we can choose "German" in search, it's implemented on the website). Most Cantonese maps are quite old, and I think that people who check metadata (e.g. metadata helpers) of qualified maps are able to see someone is Romanising a different language than they are supposed to. Never said you shouldn't discuss it, I said it would need discussion if we wanted to implement it, so that everyone agrees we can use Jyutping. (and as you've seen in this thread, it's not very easy to reach an agreement. Chinese was a big deal, not as much as Cantonese, that's why I don't think we should waste the time discussing another Romanisation system)
3. Quote me. I never said anything even close to that. I never said community shouldn't discuss it. I never said tones are to be taken into consideration, I never said it is troublesome. You suggested a system that doesn't have characters we can't type into "Romanised title". That's good, but Jyutping does use numbers for accents, which makes the Romanised text hard to understand for anyone who doesn't know Jyutping. The question is, ignore the numbers or keep them? Again, agreement from the community is needed.

nold_1702 wrote:

Arguments (2) and (3) are also flawed in a way that they are not factually correct. For (2), Jyutping is the most widely used and the most authoritative. It is supported by a University dictionary and is the most accessible. Yales Cantonese Romanisation is also backed by Yales University, but it does not have an online dictionary and is not widely used at all. For (3), the tones in Mandarin are not taken into account when romanising, I see no reason one will be interested in the tones in the romanisation of Cantonese Jyutping.

Yup, I agree with the first part about Jyutping and Yales. Never said anything against that. You, however, need community to agree on that. If we were about to implement a system that you and me would agree on, it would be pretty ignorant of the community. Regarding the second part, you can't just suggest a system that does use tones (which are the numbers) and act like they are not here. You didn't say if we just ignore the numbers or include them (which as I said, wouldn't be understandable for majority of people, that was my argument, not that tones are needed and shouldn't be discussed). If you don't say anything about a feature used in the system you suggested, I will assume you are for including it. Otherwise it would be hypocritical to suggest a system and then talk about its modified version. That's why I pointed out the numbers.

The next paragraph, as I said, German is officially added in the list of searchable languages, that's why we need to care about how we handle it. And there are 44 German songs according to it.

nold_1702 wrote:

It is one thing not to Romanise Cantonese, but it is quite another thing to Romanise Cantonese in a wrong way by using a different language system to Romanise it. Perhaps I am not clear enough. What I mean is that in the old days, where the rule that presently concerns us was yet in existence, mappers had the liberty to use Jyutping to Romanise Cantonese titles. However, once this rule is passed, all Cantonese titles will be in need to be Romanised by using the Mandarin Pinyin system.

As I said, the wording is incorrect, it should say "Mandarin", this is an error coming from the fact that someone forgot to replace it because the proposal about Chinese and Metadata overall were split. The wording will be fixed, obviously. Yes, it is important to differentiate Mandarin and Cantonese, and as I said, these days, people who check metadata of qualified maps are intelligent enough to check that you are using a Romanisation system for a correct language. Yes, current wording leads to mixing Cantonese and Mandarin, but that is an error that will be fixed. You still, in every situation should use a system designed for the language of the song.

Assuming that something just simply doesn't exist because it's not used in Hong Kong is quite funny, especially if you want to make someone look bad. There is an equivalent of Pinyin for Cantonese that is different from Jyutping.
Ulysses
Wafu my friend, I do not mean to 'make you look bad' or that I am doing anything against you, I am arguing against what you suggested, not yourself. Do not try to make things too personal as you are not in the spotlight, the issue is. Pardon me but I want to avoid starting another unnecessarily lengthy debate. It seems to me you are not quite focusing on the right thing (and I do not mean to be aggressive). See my bullet points below

- If you think community consensus is important, let the people discuss here, don't try to stop it.
- Mandarin is not an official language recognised by osu! either. Osu! recognises one big language group 'Chinese', which includes both Mandarin and Cantonese. If Mandarin is within the scope of discussion, so is Cantonese.
Topic Starter
Okoratu
i think no one ever questioned that romanisation should be based on the language lol

shall i change the current wording to mandarin while we figure the rest out?
Ulysses

Okoratu wrote:

i think no one ever questioned that romanisation should be based on the language lol

shall i change the current wording to mandarin while we figure the rest out?

Yes please
Topic Starter
Okoratu
ok done
can you debate the actual points instead of semantics then?
i dont have any power to nuke any posts here (thankfully for some retarded debates on the earlier pages) but please keep it short an concise

-> can someone sum up briefly where we stand currently on Cantonese? If we want to make people understand how it's pronounced / read then we'd need something different from the mandarin method anyways

anything that is reliable and produces readable text for people that are not cantonese should be good fwiw
Topic Starter
Okoratu

Monstrata wrote:

Have a question regarding covers. I remember mentioning this to Eph but nothing was really concluded so maybe we can get some opinions here?

Currently metadata rules seem to suggest that if a song is covered by someone, that the entire metadata field should be taken from the cover's source. Example: https://osu.ppy.sh/s/658919 vs https://osu.ppy.sh/s/637445

If an artist is clearly covering or replicating another song, I think we should be taking metadata from the original song in cases where metadata is somehow different. After all, the melodies, rhythms, lyrics, etc... are all pretty much the same. It's the same song. Just sung by someone else. So shouldn't the only thing to change be the Artist/Romanized Artist field?

I'm actually not sure when this change happened because I remember at one point the practice was to just change the Artist and keep Title/Source the same as original song.
uh not too sure id agree on covers that just do same voice but if it's a drastically different song id say you should be following the artist
xxdeathx

Natsu wrote:

It would be really annoying without the label, for example when you map the TV Size and the Full version the mapsets merge together, Also if you're looking for a normal length song you are going to get a bunch of tv sizes or viceversa... and being honest people rarely care about tags, so dropping in to tags isn't going to work.
strongly agree :) :)
Fycho
Completing the cantonese part, nold_1702 and me re-wording the proposal about Chinese and Cantonese stuffs a bit.
We think Chinese / standard Chinese / Written vernacular Chinese are actually towards the same thing, and Mandarin is kind a tone of them that isn't a language. So we just use the Chinese back.

Glossary
Character-by-character Romanisation: each Chinese character must be romanised as a capitalised word and separated with a space.

Rules
Songs with Chinese metadata must be romanised in accordance with the Character-by-character method by using Hanyu Pinyin system in Romanised fields when there is no Romanisation or translation information listed by a reputable source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Songs with Cantonese metadata must be romanised by using Jyutping system.
Ulysses
As a speaker of the two languages, I struggle to understand why you keep emphasising that Mandarin and Cantonese are just different tones. Because they are languages that use Chinese characters not just different tones. But whatever, it is not something that we have to discuss here. (For some of you who may be interested, here is a fun and short youtube video explaining the differences and similarities between the two: https://youtu.be/s2km_z4-1T8 )


Anyway, I modified the grammar and changed some words to more precise ones:

Glossary
Character-by-character Romanisation: each Chinese character must be Romanised as a capitalised word and separated with a space.


Rules
Songs with metadata in Chinese must be Romanised in accordance with the Character-by-character method by using Hanyu Pinyin system in Romanised fields if there is no Romanisation or translation information listed in a credible source. The same applies to the Source field if a romanised Source is preferred by the mapper. As they are non-unicode fields, all diacritical tone marks must be omitted. Songs with Cantonese metadata must be Romanised by using Jyutping system.
Topic Starter
Okoratu
REmoved the part where it talked about unavailability of preferred romanisation: "If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria." handles that.

Reverted Glossary

Decluttered the rule into the following statements:
  1. Songs with Chinese metadata are to be handled with respect to the tones and dialects of Chinese they belong to. In any case, al diacritical tone marks must be omitted:
    1. Mandarin metadata must be romanised using the character-by-character method.
    2. Cantonese metadata must be romanised using the Jyutping system.
    3. If the song falls into neither category, this choice is left up to the mapper's discretion


i hope this is more clear and captures the spirit of what you wanted to say while being more straightforward to digest

ToDo:
- Spacing of special characters retarded loopholes fixing
- !where Korea
- common markers rules
- CrystilionZ point needs to be applied but idk how

If the artist provides a preferred way to romanise their title or name, that is to be followed unless it conflicts with other points of this criteria.
nope, this refers to special characters and title formatting rulings above you cant stuff that into one thing
how is a translation not officially referring to the song in multiple ways?
Ulysses
Hmmm you missed the Hanyu pinyin thing in the Mandarin part and the character-by-character part in the Cantonese part.

So both of them use the character by character method.
Mandarin uses Hanyu pinyin system to romanise chinese charaters
Whereas Cantonese uses Jyutping system to romanise chinese characters

So:

Mandarin metadata must be romanised usong the Hanyu Pinyin system.

Cantonese metadata must be romanised using the Jyutping system.

In both cases, the character-by-charatcer method is to be adopted.
Topic Starter
Okoratu
fixed!
_PhiLL
should one also append (TV Size) to the end of songs which are tv size but don't indicate it in the title? as it stands now, the proposal seems to point to no. what's up with that?
Lanturn
Seeing how discussions have died, I want to post some ideas I was planning on bringing up later since the time limit was close (half a month ago). This also has a few rule changes and guidelines. Some may not even need to be guidelines, but I wanted to spark discussion on them anyways and decide whether or not they are worth adding.

Regarding Full Width Special Characters:
When it comes down to adding spaces for special characters, there is one more issue with it that I think should be addressed. Some languages like Japanese, Chinese, whatever else is in here, and the likes don't utilize spaces when reading or writing. Seeing as how Japanese is one of the most common languages here in osu!, they normally write their special characters in full-width. The Comma (、,), colon (::), brackets ((())), as well as some others, wouldn't need a space. The current rule doesn't really mention these full-width characters.

For example:
チト(CV:水瀬いのり)、ユーリ(CV:久保ユリカ) (Official)
チト (CV: 水瀬いのり)、 ユーリ (CV: 久保ユリカ) (Proposal)
チト (CV:水瀬いのり)、ユーリ (CV:久保ユリカ) (Full-Width without spaces, Follows proposal otherwise (including the parenthesis guideline). The Parenthesis are half-width, so they would naturally have a leading whitespace.)
Chito (CV: Minase Inori), Yuuri (CV: Kubo Yurika) (Romanized Proposal)

http://www.bjd.com.cn/ A Chinese newspaper site. All special characters are written in full-width and it doesn't utilize spacing.

The tl;dr is that certain special characters in full-width don't need to utilize spaces since they are somewhat naturally included in them. This is not the case with all characters and should be used accordingly.

-----------------------------------------------

Regarding half-width & full-width usages of characters in the Unicode & source fields:
(Brought up to me by S o h)
Special characters should retain their original full-width/half-width characters in the Unicode fields. An exception to this is when it used for additional complimentary info like the CV section or mix descriptors. Improper usages can result in errors while searching. https://osu.ppy.sh/ss/10623085
Example using "カラフル。(Extended edit)"
The period cannot be substituted for its counterpart. "カラフル.(Extended edit)" is not acceptable.
The parenthesis may be either half or full-width. "カラフル。(Extended edit)" is acceptable.

Original width usages should still be prioritized in the unicode field when possible.


------------------------------

Regarding Special Characters and Spacing:
(I posted this earlier, but I might as well add it here)
ジョジョ~その血の運命~ Archetype MIX Ver.
JoJo ~Sono Chi no Sadame~ Archetype MIX Ver.

when a symbol is alone and doesn't have a spacing, the romanization should have a whitespace before and after.(Ex. if the title was "ジョジョ~その" we'd use "JoJo ~ Sono" when romanizing)

When a symbol comes in pairs (like mentioned above), use a space before the first symbol and after the last symbol (Not needed if the symbol is the last character). (Ex. if the title was "ジョジョ~その血の運命~" we would use "JoJo ~Sono Chi no Sadame~"

This can be excluded if the song has a good enough reason not to use it.

----------------------------------------

Standardizing the Romanised Artist Field Order:
Another topic I want to bring up is one from a few years ago. Since we're trying to 'standardize' metadata, I feel like pushing this old thread: Romanized Artist Preferences, as it would actually benefit with the current proposals.
Right now we basically have to search high and low to find an obscure reference for a preferred romanization when a much simpler method that most database and wiki sites use is a simple standardization of "Family Given" or "Given Family" and such. In the end, our artist fields end up messy to the point that you can't tell which order is which anymore.

Fycho also brought up a point of artists sometimes have an official Translated or English name, so we'd have to figure out if those would get more priority or not. Ex. 周杰伦 is Jay Chou in English, but Zhou Jie Lun when romanized.

Right now this is my current proposal:

When romanizing the artist field, it must be printed out as the Unicode field would be when read. The sole exception to this is if the artist has an official translation and are widely known with this name. (Please English this better. The idea is simply that we type any order out on how it would be read.)

The second line would be in cases like Girls' Generation where 소녀시대 is romanized as Sonyeo Sidae (I believe). We'd still use Girls' Generation in this case. This also includes the Chinese example mentioned earlier.

Pros:
- Consistent metadata with their Unicode counterparts and we no longer have to check for preferred romanization order anymore.
- It standardizes the romanized artist field for every language, not just Eastern.

Cons:
- It will conflict with some artists' preferred romanization (Kurosaki Maon will be used instead of Maon Kurosaki and such. A lot of famous video game composers are more recognized by Given - Family as well.)

If we're going to standardize things here in osu!, we might as well tackle this since it's also fairly inconsistent at times. Hi Shimotsuki Haruka Shimotsuki.

-----------------------------------------------------

Regarding TV Size:

Even if we were to open this to say, a community vote, (and I might be jumping the gun here) I'm sure the majority would rather include the length markers, so I'll try to keep it simple.

(TV Size) is used for cuts that are used in the show. (Anime/TV Show OP/ED, Insert Songs if shortened, etc)
(Short / Extended Ver.) for everything else. (Game Size is rarely used anyways now I think about it.)
Manually cut songs that closely resemble a (TV Size) on an applicable song would use (TV Size), otherwise, they should use (Short Ver.) or (Extended Ver.)

That's about as simple as I can make it I guess so it's as standardized as possible. The biggest downside to this is that it's difficult to tell Cuts and Official releases apart, but this makes it so we don't have to be direct when it comes to the versions, and it still does mention the length appropriately. The alternative is to use whatever the original release was before the cut, but then it contradicts the point of having a marker to reference the maps length on sight.

The main goal here is to make the labels as more as identifiers and less as official then it makes sense.

--------------------------

Regarding songs that have multiple sources:

When a song has appeared in multiple media, it may use the source that the mapset is themed around (Backgrounds, Storyboards, Videos, etc.) as long as the song itself appeared in it. These should use the direct source instead of the franchise source if applied.
Examples:
https://osu.ppy.sh/s/446547 may use Grand Theft Auto Vice City as the map is themed around it and the song appears in-game.
https://www.youtube.com/watch?v=UrJcQ2nZips may not use Naruto as a source as the song doesn’t appear in any Naruto media, even if the map itself is themed around Naruto. These can be placed in the tags.

------------------------------------

Regarding Original Releases without a source:
This will have to be mostly case by case, but if a song has had a noticeable gap between its original release and then eventually ends up on another media, (take that GTA song mentioned above) the source field isn't required and can be moved to the tags instead.
This may not have to be so much of a time-gap as well. We could try focusing more on if the first source released has any major significance.

----------------------------------------------------

Repeated words in romanization:
When a song uses repeat words in the title (one in unicode, and the other as a basic romanization), the romanized field should omit the repeated word.
Examples:
AIRI-愛離- would normally be AIRI -Airi- as a romanization. This proposal would have the romanized field just be AIRI. The Unicode would still be AIRI-愛離- as it originally is.

A more severe example of this would be:
Normal: (Unicode) 花簪 HANAKANZASHI -> (Romanized) HANAKANZASHI HANAKANZASHI
Proposed: (U)花簪 HANAKANZASHI -> (R)HANAKANZASHI

--------------------------------------------------------

Using LOGOS to determine stylization choices:
Sometimes the romanization of a non-roman language will lead to little to no info of how to romanize the artist's name. In the case of where a logo is only found on a website or a CD cover writing the song in all capitalization, We should be using standard capitalization methods ( https://capitalizemytitle.com/ as we generally would in any standard title or name)
Artist preference in any other case must still be followed over this.

In other words, this will hopefully prevent ITO KASHITARO cases from happening again. This is more case by case guidelines, but the idea of romanizing based on what may possibly be just a font has lead to some unfavorable romanizations in the past.

----------------------------------

Regarding covers and use of original metadata over the covers
Brought up originally by Monstrata. Sometimes a cover by another singer may be listed with slightly incorrect metadata compared to the original. We should probably use common sense when approaching this and judge them case by case. If the cover itself has very minor errors, then the original title would be recommended. If the cover feels like more of a remix or has been altered in some major way. The cover title would be recommended.


Umm yeah. Sorry I've been kinda absent on this proposal. I'm gonna try to be a bit more active so we can get this pushed forward as it was due 2 weeks ago. Hopefully, we can get this finalized by the end of the month (My goal now)

Anyways. Happy reading. Smack me if anything seems unreasonable. I mostly just want to spark a bit more discussion before we push this forward, and I wanted to attempt to merge a few more ideas I was originally planning on bringing up after this proposal went through.
pw384
Sorry for disturbing, but I would like to confirm whether character-by-character method is also applied to the Romanization of Chinese artists name (if s/he hasn't provide an official Romanization). If so, I suggest mentioning it in the proposal as it differs from native users' daily practice, and may result in confusion if not specifically mentioned in ranking criteria. Like this: "Songs with metadata in Chinese, including both Artist and Title, must be Romanised in accordance with the Character-by-character method..."
Wafu

pw384 wrote:

Sorry for disturbing, but I would like to confirm whether character-by-character method is also applied to the Romanization of Chinese artists name (if s/he hasn't provide an official Romanization). If so, I suggest mentioning it in the proposal as it differs from native users' daily practice, and may result in confusion if not specifically mentioned in ranking criteria. Like this: "Songs with metadata in Chinese, including both Artist and Title, must be Romanised in accordance with the Character-by-character method..."
Can you explain how it differs from "native users' daily practice"? Metadata = artist + title. Every language in osu! is (and always has been) using the same system for artists and titles (unless preferred Romanisation for one exists)
pw384

Wafu wrote:

pw384 wrote:

Sorry for disturbing, but I would like to confirm whether character-by-character method is also applied to the Romanization of Chinese artists name (if s/he hasn't provide an official Romanization). If so, I suggest mentioning it in the proposal as it differs from native users' daily practice, and may result in confusion if not specifically mentioned in ranking criteria. Like this: "Songs with metadata in Chinese, including both Artist and Title, must be Romanised in accordance with the Character-by-character method..."
Can you explain how it differs from "native users' daily practice"? Metadata = artist + title. Every language in osu! is (and always has been) using the same system for artists and titles (unless preferred Romanisation for one exists)
Officially, native speakers never romanize their name via character-by-character method under any circumstance (e.g. 曹雪芹 is always romanized as Cao Xueqin in daily practice, instead of Cao Xue Qin or Cao Xue qin). So I am not sure whether the character-by-character method is applied to Artist name since it contradicts with our habits. If that is true, a specific clarification is better in wording in my opinion.
Topic Starter
Okoratu
@Chinese clarify pls im confused


Can someone sit down with me trying to digest Lanturn's post into rulings
CrystilonZ
@chinese in a nutshell Chinese names are usually romanised like "family given (or given family idk)" with only one space separating given name and family name. osu uses character-by-character cuz of the difficulty to separate words but there is no difficulty separating given name from family name and vice versa so Chinese names shouldn't be romanised character-by-character and insteand should be romanised normally, with one space separating given name and family name.

Lanturn just brings stuff that he thinks they're worth considering up. I'll try to simplify it here I guess

1. full width special chars already have built-in space so they don't need whitespace before or after those chars. Proposal should add this statement about full width chars.
2. Full width chars are handled differently. example: "。" is full width period "." and they are not interchangeable in the unicode field but half-width brackets "(" and full-width counterparts "(" are. << should be fixed
3. specify stuff regarding spacing when there are special characters involved. (romanisation)
4. screw artist's romanisation preferences. All names should be romanised like how they are read in their original languages. ea Japanese names will be romanised with Family-given order only regardless of artist's preference.
5. (TV size) and other designators.
6. what should we do when a song is featured in a lot of medias (like featured in a lot of games/movies/animes). Lanturn proposed that source should be designated according to what's the map is themed around (sb/bg etc.)
7. if source doesn't have major significance it can be moved to tags instead.
8. ignore repeated words when romanising stuff.
9. Logos aren't reliable when it comes to capitalisation. should use a standard method instead.
10. Covers often get metadata wrong. Compare data with the original release and use commonsense when dealing with covers.
Topic Starter
Okoratu
Hi~

Noffy, Lanturn and I sat down and got this worked out as a draft implementing all the above points.

The draft as a whole is available over there: https://gist.github.com/Okorin/c551fd42 ... f51ffb2736

if nothing else is brought up i'll PR this in a week ok

thank
Fycho
a minor stuff, For the consistency, romanize => romanise
TheKingHenry

Fycho wrote:

a minor stuff, For the consistency, romanize => romanise
Pretty sure both ways of typing it out are fine (same with romanization = romanisation) though surely it'd be nice to use it consistently if that's what you meant with this ¯\_(ツ)_/¯
Kurai

Okoratu wrote:

Hi~

Noffy, Lanturn and I sat down and got this worked out as a draft implementing all the above points.

The draft as a whole is available over there: https://gist.github.com/Okorin/c551fd4263e437e0ffcbd3f51ffb2736

if nothing else is brought up i'll PR this in a week ok

thank

Romanisation, Romanise, Romanised, etc. should always be capitalised.

"Lenticular brackets should be romanised to either quotation marks or square brackets depending on the context they are used in."
A bit confusing, what are the two contexts of use?

In "Russian" Romanisation: "ё should be romanised to ye, however, use yo or o to avoid usage of special characters."
Don't even mention it should be Romanised to ye if we're not doing that, it's confusing. Also, using yo or o ? It should only be yo, it is never pronounced o?
TheKingHenry
Songs with German metadata must romanise umlauts into two-letter equivalents (ue, oe, ae and ss).
Is this supposed to contain the nordic equivalents of these too? As well as the additional Ø Æ Å and whatever there are, how's the deal with them?
show more
Please sign in to reply.

New reply