Seeing how discussions have died, I want to post some ideas I was planning on bringing up later since the time limit was close (half a month ago). This also has a few rule changes and guidelines. Some may not even need to be guidelines, but I wanted to spark discussion on them anyways and decide whether or not they are worth adding.
Regarding Full Width Special Characters:
When it comes down to adding spaces for special characters, there is one more issue with it that I think should be addressed. Some languages like Japanese, Chinese,
whatever else is in here, and the likes don't utilize spaces when reading or writing. Seeing as how Japanese is one of the most common languages here in osu!, they normally write their special characters in full-width. The Comma (、,), colon (::), brackets
((())
), as well as some others, wouldn't need a space. The current rule doesn't really mention these full-width characters.
For example:
チト(CV:水瀬いのり)、ユーリ(CV:久保ユリカ) (Official)
チト (CV: 水瀬いのり)、 ユーリ (CV: 久保ユリカ) (Proposal)
チト (CV:水瀬いのり)、ユーリ (CV:久保ユリカ) (Full-Width without spaces, Follows proposal otherwise (including the parenthesis guideline). The Parenthesis are half-width, so they would naturally have a leading whitespace.)
Chito (CV: Minase Inori), Yuuri (CV: Kubo Yurika) (Romanized Proposal)
http://www.bjd.com.cn/ A Chinese newspaper site. All special characters are written in full-width and it doesn't utilize spacing.
The tl;dr is that certain special characters in full-width don't need to utilize spaces since they are somewhat naturally included in them. This is not the case with all characters and should be used accordingly.
-----------------------------------------------
Regarding half-width & full-width usages of characters in the Unicode & source fields:
(Brought up to me by S o h)
Special characters should retain their original full-width/half-width characters in the Unicode fields. An exception to this is when it used for additional complimentary info like the CV section or mix descriptors. Improper usages can result in errors while searching.
https://osu.ppy.sh/ss/10623085
Example using "カラフル。(Extended edit)"
The period cannot be substituted for its counterpart. "カラフル.(Extended edit)" is not acceptable.
The parenthesis may be either half or full-width. "カラフル。(Extended edit)" is acceptable.
Original width usages should still be prioritized in the unicode field when possible.
------------------------------
Regarding Special Characters and Spacing:
(I posted this earlier, but I might as well add it here)
ジョジョ~その血の運命~ Archetype MIX Ver.
JoJo ~Sono Chi no Sadame~ Archetype MIX Ver.
when a symbol is alone and doesn't have a spacing, the romanization should have a whitespace before and after.(Ex. if the title was "ジョジョ~その" we'd use "JoJo ~ Sono" when romanizing)
When a symbol comes in pairs (like mentioned above), use a space before the first symbol and after the last symbol (Not needed if the symbol is the last character). (Ex. if the title was "ジョジョ~その血の運命~" we would use "JoJo ~Sono Chi no Sadame~"
This can be excluded if the song has a good enough reason not to use it.
----------------------------------------
Standardizing the Romanised Artist Field Order:
Another topic I want to bring up is one from a few years ago. Since we're trying to 'standardize' metadata, I feel like pushing this old thread:
Romanized Artist Preferences, as it would actually benefit with the current proposals.
Right now we basically have to search high and low to find an obscure reference for a preferred romanization when a much simpler method that most database and wiki sites use is a simple standardization of "Family Given" or "Given Family" and such. In the end, our artist fields end up messy to the point that you can't tell which order is which anymore.
Fycho also brought up a point of artists sometimes have an official Translated or English name, so we'd have to figure out if those would get more priority or not. Ex. 周杰伦 is Jay Chou in English, but Zhou Jie Lun when romanized.
Right now this is my current proposal:
When romanizing the artist field, it must be printed out as the Unicode field would be when read. The sole exception to this is if the artist has an official translation and are widely known with this name. (Please English this better. The idea is simply that we type any order out on how it would be read.)
The second line would be in cases like Girls' Generation where 소녀시대 is romanized as Sonyeo Sidae (I believe). We'd still use Girls' Generation in this case. This also includes the Chinese example mentioned earlier.
Pros:
- Consistent metadata with their Unicode counterparts and we no longer have to check for preferred romanization order anymore.
- It standardizes the romanized artist field for every language, not just Eastern.
Cons:
- It will conflict with some artists' preferred romanization (Kurosaki Maon will be used instead of Maon Kurosaki and such. A lot of famous video game composers are more recognized by Given - Family as well.)
If we're going to standardize things here in osu!, we might as well tackle this since it's also fairly inconsistent at times. Hi Shimotsuki Haruka Shimotsuki.
-----------------------------------------------------
Regarding TV Size:
Even if we were to open this to say, a community vote, (and I might be jumping the gun here) I'm sure the majority would rather include the length markers, so I'll try to keep it simple.
(TV Size) is used for cuts that are used in the show. (Anime/TV Show OP/ED, Insert Songs if shortened, etc)
(Short / Extended Ver.) for everything else. (Game Size is rarely used anyways now I think about it.)
Manually cut songs that closely resemble a (TV Size) on an applicable song would use (TV Size), otherwise, they should use (Short Ver.) or (Extended Ver.)
That's about as simple as I can make it I guess so it's as standardized as possible. The biggest downside to this is that it's difficult to tell Cuts and Official releases apart, but this makes it so we don't have to be direct when it comes to the versions, and it still does mention the length appropriately. The alternative is to use whatever the original release was before the cut, but then it contradicts the point of having a marker to reference the maps length on sight.
The main goal here is to make the labels as more as identifiers and less as official then it makes sense.
--------------------------
Regarding songs that have multiple sources:
When a song has appeared in multiple media, it may use the source that the mapset is themed around (Backgrounds, Storyboards, Videos, etc.) as long as the song itself appeared in it. These should use the direct source instead of the franchise source if applied.
Examples:
https://osu.ppy.sh/s/446547 may use Grand Theft Auto Vice City as the map is themed around it and the song appears in-game.
https://www.youtube.com/watch?v=UrJcQ2nZips may not use Naruto as a source as the song doesn’t appear in any Naruto media, even if the map itself is themed around Naruto. These can be placed in the tags.
------------------------------------
Regarding Original Releases without a source:
This will have to be mostly case by case, but if a song has had a noticeable gap between its original release and then eventually ends up on another media, (take that GTA song mentioned above) the source field isn't required and can be moved to the tags instead.
This may not have to be so much of a time-gap as well. We could try focusing more on if the first source released has any major significance.
----------------------------------------------------
Repeated words in romanization:
When a song uses repeat words in the title (one in unicode, and the other as a basic romanization), the romanized field should omit the repeated word.
Examples:
AIRI-愛離- would normally be AIRI -Airi- as a romanization. This proposal would have the romanized field just be AIRI. The Unicode would still be AIRI-愛離- as it originally is.
A more severe example of this would be:
Normal: (Unicode) 花簪 HANAKANZASHI -> (Romanized) HANAKANZASHI HANAKANZASHI
Proposed: (U)花簪 HANAKANZASHI -> (R)HANAKANZASHI
--------------------------------------------------------
Using LOGOS to determine stylization choices:
Sometimes the romanization of a non-roman language will lead to little to no info of how to romanize the artist's name. In the case of where a logo is only found on a website or a CD cover writing the song in all capitalization, We should be using standard capitalization methods (
https://capitalizemytitle.com/ as we generally would in any standard title or name)
Artist preference in any other case must still be followed over this.
In other words, this will hopefully prevent ITO KASHITARO cases from happening again. This is more case by case guidelines, but the idea of romanizing based on what may possibly be just a font has lead to some unfavorable romanizations in the past.
----------------------------------
Regarding covers and use of original metadata over the covers
Brought up originally by Monstrata. Sometimes a cover by another singer may be listed with slightly incorrect metadata compared to the original. We should probably use common sense when approaching this and judge them case by case. If the cover itself has very minor errors, then the original title would be recommended. If the cover feels like more of a remix or has been altered in some major way. The cover title would be recommended.
Umm yeah. Sorry I've been kinda absent on this proposal. I'm gonna try to be a bit more active so we can get this pushed forward as it was due 2 weeks ago. Hopefully, we can get this finalized by the end of the month (My goal now)
Anyways. Happy reading. Smack me if anything seems unreasonable. I mostly just want to spark a bit more discussion before we push this forward, and I wanted to attempt to merge a few more ideas I was originally planning on bringing up after this proposal went through.