forum

[Proposal] Clarification on romanisation of stylized unicode metadata

posted
Total Posts
6
Topic Starter
Dignan
Currently there's no clear rule or allowance for romanising unicode metadata for stylized artists/titles which do not belong to a foreign language.

A common example in osu is "Rigël Theatre", which is romanised as "Rigel Theatre". They also have a song called "Erlija Gäte" where the trema/umlaut a is used for stylistic purposes rather than a specific non-english letter.

Anothe example would be "Blue Öyster Cult" who consistently use the Ö everywhere http://www.blueoystercult.com/ which should be romanised as "Blue Oyster Cult", but could be misinterpreted as being "Blue Oeyster Cult" without such a clarification/allowance.

Proposed wording under Allowances:

If a Unicode song title or artist uses unicode symbols as a stylized replacement of Latin Alphabet characters, the corresponding latin alphabet character may be used in the Romanised fields.
SaltyLucario
does this really need to be clarified? i always thought this is just common sense (see aureole or humer songs, first things that come to mind)
i guess it wont hurt to add it though
lewski
Based on how it was deemed good enough to use instead of a separate rule for Greek, I think this might be supposed to be covered by the catch-all romanisation rule:

Ranking Criteria wrote:

Metadata in other languages not specifically covered in this section and lacking official romanisation from the artist must use a system common and recognisable for the language.
It wasn't at all obvious to me just from the RC, though, I had to read both linked threads to come to that conclusion.

What caught my eye far better was the rule that actually mentions umlauts:

Ranking Criteria wrote:

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss.
This rule completely contradicts the common-sense way of romanising letters used for stylistic purposes only. Based on that and how hard it is to interpret the catch-all as an exception to this, I'd say a clarification like the one in the proposal has the potential to dispel a lot of confusion. Of course, if this interpretation of the catch-all is incorrect, the allowance definitely needs to be added, since without it the RC contradicts the way this stuff is romanised in practice right now.


I have to wonder, though: have there been any actual cases of this being an issue, or has common sense always triumphed so far? I'm mainly just personally curious, but anecdotes would also help your case.
Topic Starter
Dignan
I suppose the following could also technically apply here:

Ranking Criteria wrote:

Special unicode characters must be filtered to their nearest standard equivalent or removed from the romanised fields within a .osu file. ★ ☆ ⚝ ✪ and the likes are substituted to an asterisk (*). Other special characters are to be romanised or dropped on case-by-case basis.
I'm not aware of any cases where the incorrect version was ranked, but in the thread you linked and in this map beatmapsets/1647145/discussion/-/generalAll#/3278710 for example there's clearly potential for people to misunderstand.
Ryu Sei
Necroposting this because I wanted to open up the same topic, but apparently this one already exist.

lewski wrote:

I have to wonder, though: have there been any actual cases of this being an issue, or has common sense always triumphed so far? I'm mainly just personally curious, but anecdotes would also help your case.
There should be no issues so far, however with how current rules are written, there is an ambiguity and a technical "rule break", because current romanisation rule specifically mentions characters regardless on how it is used, whether it's linguistically correct or for artistic purpose by using character lookalikes. Here are some examples (that are even breaking the rules and ranked/loved either way):
  1. beatmapsets/2041478 (APOCALYPSE RAY instead of APOECALYPSE RAY)
  2. beatmapsets/325867 (Slit O instead of OE O)
  3. beatmapsets/2028207 (LaureLs ~the Angelus~ instead of LamreLs ~the Angelus~)
Most SOUND VOLTEX songs are the repeat offender of this rule, due to heavy stylisation on the title writing using character lookalikes. These map nominators also break some rules of some specific character's romanisation which I will mention together below.

Dignan wrote:

I suppose the following could also technically apply here:

Ranking Criteria wrote:

Special unicode characters must be filtered to their nearest standard equivalent or removed from the romanised fields within a .osu file. ★ ☆ ⚝ ✪ and the likes are substituted to an asterisk (*). Other special characters are to be romanised or dropped on case-by-case basis.
That is incorrect. The romanisation rule explicitly mentions umlauts and some metadata containing specific Nordic characters, regardless of the usage. The only exception of these are if the song has official romanisation, which overrides the rules.

Ranking Criteria wrote:

  1. Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae, and ß to ss. In Swedish and Finnish, ö and ä should instead be romanised to o and a respectively.
  2. Metadata containing Nordic letters must be romanised to the following: æ to ae, ø to oe, and å to aa. In Swedish, å should instead be romanised to a.
Personally, my take on these cases would be adding allowance for such songs using non-ASCII characters under pretext of stylised artist/title writing. This should be done on case-by-case basis, so the modders and nominators should check whether these characters are used in their respective language or simply as character-lookalikes stylisation.
Metadata containing stylised non-ASCII characters may be filtered to their nearest character lookalikes within a .osu file on a case-by-case basis.

This way, our current rule won't harm songs with stylised metadata that breaks the rule between our current romanisation rule and what does artists may intend to be. This rule also help for heavily-stylised songs with no official romanisation, so it would clarify on how the song should be read just by looking at the character lookalikes.
Okoratu
community/forums/topics/1894663 would address this issue because it specifies what umlauts do in which languages, and if it's not a language you should be using something sensible https://gist.github.com/Okorin/25240445fdbce1febc603553e5eb94ae#language-and-writing-system-romanisation-rules which i think the consensus of this thread would agree to
Please sign in to reply.

New reply