forum

[added] [Proposal] Add Nordic characters to Romanisation Metadata Rules

posted
Total Posts
29
Topic Starter
Protastic101
This was brought up to me by Tailsdk regarding a lack of Nordic character romanisation rules. Currently, the only advice is to just "ask a native speaker." So here, he's proposed adding the following rules for romanising Nordic characters:

Æ -> Ae
æ -> ae
Ø -> Oe
ø -> oe
Å -> Aa
å -> aa

Change
Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss.
to
Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. In Swedish and Finnish, ö and ä should be romanised to o and a respectively.
Add
Metadata containing Nordic letters must be romanised to the following: æ to ae, ø to oe, å to aa. If it is Swedish, å should instead be romanised to a, and Finnish should romanise å to o.
Posting here as a sanity check from others to make sure I didn't miss anything.

PR here https://github.com/ppy/osu-wiki/pull/9423
oyu
Imo it'd be enough to list only the lowercase examples just like the other romanisation rules do (e.g. umlauts)
P_O
Agree +1.

This would clarify romanisation for this region a lot. This is part of the rc that has been pretty unclear and i think this would be good for old and newer mappers as well.
also

Ä -> Ae
ä -> ae

Ö -> Oe
ö -> oe
Topic Starter
Protastic101

P_O wrote:

Agree +1.

This would clarify romanisation for this region a lot. This is part of the rc that has been pretty unclear and i think this would be good for old and newer mappers as well.
also

Ä -> Ae
ä -> ae

Ö -> Oe
ö -> oe
Umlauts are already covered in the RC:
Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss.
lewski

Protastic101 wrote:

Currently, the only advice is to just "ask a native speaker."
I wonder if the reason that rule was added applies here, too:

community/forums/topics/1357437?n=1

Noffy wrote:

Currently the metadata RC covers [Russian/Cyrillic, Japanese, and Chinese] as they are frequently ranked and are worth covering on their own
community/forums/topics/1256836?n=15

Noffy wrote:

I do not think we should be adding dedicated sections to RC for languages which are rarely ranked
Right now, there are only 39 ranked sets across the five major Nordic languages. Of those, 16 are in Finnish, which doesn't have Æ or Ø at all and only uses Å in Swedish names. I'm not sure where the line should be drawn.

I'm also a bit concerned about å -> aa, because although afaik it makes sense in Danish and Norwegian, I don't think Swedish has nearly as strong of a tradition of using "aa" for that sound. Å apparently replaced it over 400 years ago. I can't really think of a better option, though.


side note regarding umlauts: that rule doesn't and shouldn't apply to Ä and Ö in Finnish because 1. they're technically separate letters, not umlaut-A and umlaut-O, and 2. it creates abominations like hääyö -> haeaeyoe (looks horrible) and hän -> haen (completely different word)
P_O

lewski wrote:

Protastic101 wrote:

Currently, the only advice is to just "ask a native speaker."
I wonder if the reason that rule was added applies here, too:

community/forums/topics/1357437?n=1

Noffy wrote:

Currently the metadata RC covers [Russian/Cyrillic, Japanese, and Chinese] as they are frequently ranked and are worth covering on their own
community/forums/topics/1256836?n=15

Noffy wrote:

I do not think we should be adding dedicated sections to RC for languages which are rarely ranked
Right now, there are only 39 ranked sets across the five major Nordic languages. Of those, 16 are in Finnish, which doesn't have Æ or Ø at all and only uses Å in Swedish names. I'm not sure where the line should be drawn.

I'm also a bit concerned about å -> aa, because although afaik it makes sense in Danish and Norwegian, I don't think Swedish has nearly as strong of a tradition of using "aa" for that sound. Å apparently replaced it over 400 years ago. I can't really think of a better option, though.


side note regarding umlauts: that rule doesn't and shouldn't apply to Ä and Ö in Finnish because 1. they're technically separate letters, not umlaut-A and umlaut-O, and 2. it creates abominations like hääyö -> haeaeyoe (looks horrible) and hän -> haen (completely different word)
yea it would probably be better to just use a/o instead of ae/oe variants for the sake of clarity. Also i think at least in case of finnish/swedish Å/å could be considered to be written O/o
Tailsdk
So from what I have gathered

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss.

Should be changed to something like this

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. For Finnish they should instead be romanised to the following: ü to u, ö to o, ä to a.

and this rule should be added

Metadata containing Nordic letters must be romanised to the following: æ to ae, ø to oe, å to aa. If it is Swedish å should instead be romanised to o.

This should in general help clear up a lot of confusion
Topic Starter
Protastic101
I have updated the proposal to reflect Lewski's and P_O's concerns as suggested by Tails.
oyu

Tailsdk wrote:

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. For Finnish they should instead be romanised to the following: ü to u, ö to o, ä to a.
Dunno about this wording since as lewski already said, Finnish does not use umlauts
beatmapsets/1240966/discussion/-/generalAll#/2190388/6064846
https://en.wikipedia.org/wiki/Two_dots_(diacritic)
Topic Starter
Protastic101
What about this then:

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. In Finnish, what appears to be an umlaut is actually a diaresis, and should be romanised to the following single-letter equivalents: ü to u, ö to o, ä to a.
oyu

Protastic101 wrote:

What about this then:

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. In Finnish, what appears to be an umlaut is actually a diaresis, and should be romanised to the following single-letter equivalents: ü to u, ö to o, ä to a.
I was thinking more like

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. The Finnish letters ö and ä shoud be romanised as o and a respectively.
Maybe Swedish or Estonian should be mentioned also?

https://en.wikipedia.org/wiki/Two_dots_(diacritic)#Vowels
The Swedish, Finnish and Estonian languages use ⟨Ä⟩ and ⟨Ö⟩ to represent [æ] and [ø]
Topic Starter
Protastic101

oyu wrote:

I was thinking more like

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. The Finnish letters ö and ä shoud be romanised as o and a respectively.
I know nothing about Finnish, does Finnish not use ü at all?

Maybe Swedish or Estonian should be mentioned also?

oyu wrote:

Maybe Swedish or Estonian should be mentioned also?

https://en.wikipedia.org/wiki/Two_dots_(diacritic)#Vowels
The Swedish, Finnish and Estonian languages use ⟨Ä⟩ and ⟨Ö⟩ to represent [æ] and [ø]
This should already be included in the top post about adding Nordic letters. I can't read. Are Ä and Ö used interchangeably with æ and ø? Cause I already included how to romanise those.
lewski

Protastic101 wrote:

does Finnish not use ü at all?
Yeah, there's no Ü in Finnish (or any other Nordic language, for that matter). Estonian does have it, but it's not a U-umlaut.

Protastic101 wrote:

Are Ä and Ö used interchangeably with æ and ø? Cause I already included how to romanise those.
The "[æ] and [ø]" part is most likely referring to the sounds in IPA format. Ä and Ö aren't used interchangeably with Æ and Ø per se, they're just different letters referring to similar sounds (the exact sounds vary slightly from language to language and sometimes by the lengths of the vowels as well). Swedish and Finnish only use the former, while Danish and Norwegian only use the latter.

Interestingly, Icelandic uses Æ and Ö, but not Ä or Ø, but that language is a whole new can of worms because their alphabet has 10 non-unicode letters. Estonian uses Ä and Ö like Finnish and Swedish, but mentioning it may open a small can of worms as well, because it also uses the non-umlaut Ü I mentioned above as well as Š, Ž, and Õ.

Maybe the best way to avoid this mess would be not to mention any specific languages in the umlaut rule and to opt for a general note about the existence of non-umlaut Ä, Ö, and Ü instead? It's a bit offtopic for a proposal about Æ and Ø, though. I might open a new one for this.

Umlauts must be romanised into two-letter equivalents: ü to ue, ö to oe, ä to ae and ß to ss. This does not apply in cases where ü, ö, and ä represent separate letters or letters with other kinds of double dots instead of umlauts.
Topic Starter
Protastic101
That looks like it works. Regarding Ä and Ö, should this be included in the "Metadata containing Nordic letters must be romanised to the following: æ to ae, ø to oe, å to aa. If it is Swedish, å should instead be romanised to o." addition as well?
lewski
not quite sure what you're referring to with "this", but if it's the non-umlaut clause, it doesn't need to be in the Nordic letter addition cause the addition doesn't mention any umlauts

also fwiw it would be really good to get a comment on the Å->O part from a Swede as well cause right now it's kinda just based on what P_O and I said and we're Finns
Topic Starter
Protastic101
By "this", I meant the letters Ä and Ö into the non-umlaut clause, although I think the current umlaut clause probably needs a reword since the "Umlauts must be romanised into two-letter equivalents" is kind of unclear with how dieresis marks are different from umlauts. Definitely would appreciate any Swedish speakers chiming in too.
lewski
I guess Ä and Ö wouldn't be too crazy in a rule with Æ and Ø, since Finnish and Swedish have 16 ranked sets each, while Danish has 4, Norwegian has 2, and Icelandic has 1.

I'm just worried that the complexity of the rule might get out of hand compared to how rarely it would be used, as I mentioned in my first post. We've already got specific instructions for both Finnish and Swedish in the proposal, and Ä and Ö in Swedish aren't even covered yet.

It feels kind of weird to go back to this after all this discussion, but right now, I just am leaning towards not adding the rule at all. I feel like this is exactly the type of thing the "just ask a native speaker" rule exists for: small languages with few ranked sets.

@Tailsdk: Out of curiosity, do people have disputes about Danish/Norwegian romanisation often, or was this more of a "there's no specific rule about this yet" type of deal?


(also just to be clear diaereses aren't technically part of the main topic here because they're their own concept, not just a fancy word for any two dots)
Tailsdk
There was a map with Æ that i noticed had wrong romanisation going for ranked and nobody really bothered to come ask me so its not something people will do.

So like a map with ä for finish and swedish is pretty likely just go through the umlaut rule currently even if its wrong which is why i want to specify it.

I don't tvink extra info hurts especially with Æ and Ø being used every now and then for edgy EDM titles.

Please do say what you think it should be for swedish and finnish i do think more info is better
lewski
Yeah I guess if people just do not care metadata is pretty doomed without easily accessible romanisation rules

idk how Swedes handle romanisation but for Finnish it's just ä->a and ö->o, over here å is only used in names of Swedish origin so lowkey I don't care about it but I'm tempted to say the same as P_O i.e. å->o because of how you pronounce it with a Finnish accent
Tailsdk
å -> o is what i gathered from swedish wikipedia seems like we need someone from sweden to give their input to this to really finalize anything
lewski
I asked a few people and got a clear majority for Ä->A and Ö->O

Å is a bit more ambiguous; I got both Å->A and Å->Aa (no Å->O)
Tailsdk
Aa is probably better then for consistency then
roufou
I dunno if it matters but Å into A instead of Aa seems weird to me from the perspective of a Norwegian. (dunno if others will agree, though)

I'm in favor of Aa

edit: yeah I feel like Aa is unanimous for norwegian after asking around a bit (not extensively though), a or o are more likely for sweden though I'm pretty sure.
lewski
to be clear I only asked people about Swedish, should have specified in my post
roufou
oh shoot, my bad. Though after looking at some Swedish words with Å, I have no idea if you could even come with a standard romanization cause it probably depends on the word... unless you decide to standardize it even if it would be pronounced differently. I dunno I think it'd be hard to get an uniform opinion on it.

fwiw it'd normally be A I'm pretty sure

edit: yeah, I feel like it's gonna be hard to get a catch all opinion, but I would probably suggest if you really want a standard option then Å into A would probably be the safest choice for swedish... obviously would need input whether any Swedish people could think of words where it doesn't make sense to do Å into A. I discussed it a little with someone who is Swedish.
Tailsdk
Agreements so far then

Nordic characters:
  1. General:
    1. æ -> ae
    2. ø -> oe
    3. å -> aa
  2. Swedish:
    1. å -> a
  3. Finnish:
    1. å -> o
Umlaut:
  1. General:
    1. ö -> oe
    2. ä -> ae
    3. ü -> ue
  2. Finnish & Swedish:
    1. ä -> a
    2. ö -> o
Topic Starter
Protastic101
Apologies for the delay, I've edited the original post to reflect this consensus.
lewski
oops im also giga late but i forgot about the å -> o part

theres actually no need to include a finnish guideline for å because the letter isn't used in the actual language, we just have it since it shows up in swedish names which are fairly common here
Topic Starter
Protastic101
https://github.com/ppy/osu-wiki/pull/9423#issue-1718287849 Merged! Gonna poke a mod to move this to finalized and add the appropriate forum tag
Please sign in to reply.

New reply