forum

[Proposal] Global osu! Metadata Standard

posted
Total Posts
16
Topic Starter
Ephemeral
This has kind of been a pet peeve of mine for a while, but over the past few years of having 1:1 source-to-osu! metadata being prevalent in the game, the wildly varying formats of artists across the globe have been running havoc on osu!'s beatmap listing.

We're so accurate with this now that the vast inconsistencies in how artists and producers handle various aspects of song metadata are now pretty plain to see. The metadata crew have done pretty remarkably well in that regard.

Sources which have never considered the need for any sort of standard have created situations where we have tracks ranked on a daily basis with grammatical and syntactical errors. It looks horrible, and we have encountered situations where mappers have been forced to use flagrantly incorrect (from a grammatical and syntax standpoint) metadata so as to adhere to the current rule.

I think we can do better.

Thus, I propose the beginning of a global osu! metadata standard, wherein we effectively parse the metadata available for a given track and move it to something more uniform and consistent going forward. The reasoning behind this is that osu! is rapidly becoming more than just a game - the changes to osu!lazer align it towards the goal of becoming a platform, and a platform should aim to have clean and consistent presentation across all fronts wherever possible.

You may read the current draft of the metadata standard here.

To put it shortly, this is what the draft entails:

  1. A general return to 'common sense first' formatting in ambiguous situations
  2. Enforced whitespace and syntax correctness in all metadata titles
  3. TV Size designation for anime OP/ED cuts enforced to TV Size only, regardless of source metadata
  4. Default to using source tagging on mashup tracks where appropriate
  5. Default to using producer track name for Vocaloid or Vocaloid-like tracks
  6. Syntax standard for common metadata formats and terms, such as feat., vs., with, and (&)
  7. Standard for handling Character Voice (CV) designations
  8. Standards for handling character replacement for common unicode characters
Please read the full draft thoroughly and suggest any amendments or point out any problems, with the goal in mind of creating a clean and concise standard for metadata presentation.
-Makishima S-
Would like to propose also rule to tag "cut-down" full size songs somehow. There are maps which are being shortened to TV-Size, yet they doesn't have any official short release.

At current stage, if song is not tagged / doesn't contain any metadata about being an unofficial fan made short version, it counts as misleading information about song and author.

Example: https://osu.ppy.sh/s/479811
Ripped of this video: https://www.youtube.com/watch?v=-g4Q6-q9YEM
Audio have same acoustic spectrum (Youtube doesn't recognize acoustic fingerprint), yet original song is different in term of spectrum.
Nitrous
Will there be standardized designations on how to handle the artist field when it can't fit with the character limit or will this fall under various artists?

For instance, this beatmap: Cocoa, Chino, Rize, Chiya, Syaro, Maya, Megu - Honjitsu wa Makoto ni Rarirurein
All artists named are just character names from an anime and their character voices are listed under tags.
Topic Starter
Ephemeral

-Nitrous wrote:

Will there be standardized designations on how to handle the artist field when it can't fit with the character limit or will this fall under various artists?

For instance, this beatmap: Cocoa, Chino, Rize, Chiya, Syaro, Maya, Megu - Honjitsu wa Makoto ni Rarirurein
All artists named are just character names from an anime and their character voices are listed under tags.
This would be covered under the "Various Artists" clause listed in the draft.
Nitrous
How about extreme cases about songs with no metadata at all? One case happened to this beatmap specifically:
  1. Thyro Alfaro - Champion Energy 2.5mins 062017
Wherein the only considered valid metadata was from the mp3 file's metadata which only contained the title (which doesn't look good at all). The artist is possibly inaccurate as it was only done through searches from Google.
Venix

-Nitrous wrote:

How about extreme cases about songs with no metadata at all? One case happened to this beatmap specifically:
  1. Thyro Alfaro - Champion Energy 2.5mins 062017
Wherein the only considered valid metadata was from the mp3 file's metadata which only contained the title (which doesn't look good at all). The artist is possibly inaccurate as it was only done through searches from Google.
I think it had official resource at all. This was name of .mp3 downloaded from official site.

But honestly I think this "2.5mins 062017" should be removed from title because these numbers are propably just data about this specific sample.
Nitrous

Venix wrote:

-Nitrous wrote:

How about extreme cases about songs with no metadata at all? One case happened to this beatmap specifically:
  1. Thyro Alfaro - Champion Energy 2.5mins 062017
Wherein the only considered valid metadata was from the mp3 file's metadata which only contained the title (which doesn't look good at all). The artist is possibly inaccurate as it was only done through searches from Google.
I think it had official resource at all. This was name of .mp3 downloaded from official site.
No artist was specified though.
Venix

-Nitrous wrote:

No artist was specified though.
I think it should be changed to Unknown Artist tbh
Nitrous

Venix wrote:

-Nitrous wrote:

No artist was specified though.
I think it should be changed to Unknown Artist tbh
I wish extreme cases like these should be considered. It caused havoc to the set.
Topic Starter
Ephemeral

-Nitrous wrote:

How about extreme cases about songs with no metadata at all? One case happened to this beatmap specifically:
  1. Thyro Alfaro - Champion Energy 2.5mins 062017
Wherein the only considered valid metadata was from the mp3 file's metadata which only contained the title (which doesn't look good at all). The artist is possibly inaccurate as it was only done through searches from Google.
It beggars common sense to think that "2.5 mins 062017" is actually part of the song name in this context, so under the proposed standard, it would be parsed to be just "Thyro Alfaro - Champion Energy" instead.
xtrem3x

Ephemeral wrote:

-Nitrous wrote:

Will there be standardized designations on how to handle the artist field when it can't fit with the character limit or will this fall under various artists?

For instance, this beatmap: Cocoa, Chino, Rize, Chiya, Syaro, Maya, Megu - Honjitsu wa Makoto ni Rarirurein
All artists named are just character names from an anime and their character voices are listed under tags.
This would be covered under the "Various Artists" clause listed in the draft.

2 or more artists in a song compressed to V.A. (Various Artists) -w-
Noffy
Both examples regarding vocaloids have the (ft.) bit in their titles, wouldn't it be better to have one example where it's in the title and one where it's in the artist?

Additionally, the way it is worded + the examples are very confusing.
If the track presents itself from an "official" source as being listed as having a Vocaloid artist, then it is likely safe to use the Vocaloid as the artist.
Does that mean if a source says "Artist: Hatsune Miku" then just Hatsune Miku in the artist field would be acceptable? That sounds like a slippery slope when
a majority of vocaloid videos are formatted along the lines of "[Hatsune Miku] Song Title [Original Song]", since it could be argued that the "[Hatsune Miku]" is specifying her as the artist of the song.. and such..
Or is this meaning to refer to the things such as "ft. Hatsune Miku" as part of a larger artist or title field?

It is always expressed as a lowercase vs., and must always be followed by a trailing whitespace, clearly dividing the two artists.
Artists should not be relied on for exact syntactical representation of their work unless it is made extremely obvious that such inconsistencies are an artistic liberty with the track itself.
Saying "always" for one thing and then allowing changes for where it is clearly artistic stylization conflict with each other and can cause additional ambiguity and confusion.
In any of these rules, if it is EXTREMELY OBVIOUS that it is the artist's intent, variation should be allowed.
For example, uppercase "VS" when it is done to stand out from entirely lowercase artists.

It is also unclear whether this standard allows either "." or no "." after the "vs", or if the period/full-stop is always required. The current examples makes it appear that it is always required.
The use of "." after "vs" is largely preference. Parts of the world will often drop the "." entirely. "vs." is more common in American English, while "vs" is often found in British English, due to differing rules regarding abbreviations. Dropping the "." is especially common when the addition of the period/full-stop would ruin the visual balance of the text. This should be a matter of preference for the song's artist where the metadata follows what they choose to use.
If we are to prefer one version of English rules over another, this should be made clear from the start and throughout the draft.

In conclusion, I believe the only part that should be kept regarding this is regarding having proper whitespace before and after.

Sumitsukikakko (【 】) and Kagikakko (or quchixing de yinhao (曲尺形的引号)) (「」) must be replaced with standard quotation marks. ("")
【 】 should be typically romanized as [ ] , they are often used the same way as regular brackets would be, in titles and headings. Whether this should be romanized as " " depends on context, such as if it is clearly being used as a quotation such as from a song lyric or other. Using " " in all cases can change the intended formatted appearance and potentially even meaning.

「」 are occassionally used in place of [] as well, although this is much more rare.

Additionally, these symbols have English names as well, officially specified in unicode standards.
【 】 Lenticular Brackets
「 left corner bracket
」right corner bracket.
I don't see why they are being called by their japanese/chinese names in a set of standards written in English when English names are available.


edit:
Also, it's not clear whether this proposal is meant to be developed alongside the metadata ruleset draft, or as a replacement for it. If it's the latter, it's missing quite a bit. If it's the former, I feel this should be worked into the ruleset draft, rather than separately.
Stefan
Little question related to mash-ups: A lot of people releases mash-ups under their artist name (for example: I am Jemboy) and give their mash-ups own names (for example: Sacred Encore). Would we still label the used artist/songs of that mash-up or would the format "I am Jemboy - Sacred Encore" be correct? Since people vary by giving either "song 1 vs. song 2" titles or own names I am curious how this is handled.
Topic Starter
Ephemeral
this has been revived and is being discussed in the Metadata Heap discord

so far, we're currently mulling over the current topics:

* regearing the standard to function as a 'guarantee of quick lookup' by ensuring every beatmapset title can be ctrl-c/ctrl-v'd into google and yield a link to an artist page/reseller even if it may not be the most "correct" result
* consolidating formatting aspects of the standard into a smaller section to produce a more concise standard
* offloading extra data into tags

we're also pursuing several metadata display features in the new osu! website to help better handle metadata situations, and also ultimately push towards a distinct metadata format that allows changes to be made to maps without needing to DQ them

aiming to get at the very least, the standard part ironed out before january's end, hopefully!
Noffy
An invite link to said Metadata Heap may be found here for those who wish to contribute in those real-time discussions:
https://discord.gg/9Y4EdyM
pishifat
archiving thread

content here is promoted in https://osu.ppy.sh/community/forums/topics/719568
Please sign in to reply.

New reply