Hey all! I wanted to share a project I've been working on for the past month or so with the tentative name "osu! Beatmap Atlas":
https://osu-map.ameo.design/
It's a web-based tool for exploring the world of osu! beatmaps via an embedding visualization. I'm hoping it will be useful for finding maps to play which are similar to your favorites, comparing your playstyle and progression to other players, and just exploring the world of osu! and seeing how the meta has evolved over the years.
If you enter your osu! username, it will highlight scores that have been in your top 100 as well as find your best play on other maps when you click on them. I've also added some other features like pp simulation for FCs with different accs.
----
Anyway, it's definitely not "done" yet, but I think it's finally in a very usable state! I'd love to hear feedback from people on what they think of it, if any parts are confusing, etc. and ofc I'd greatly appreciate any bug reports.
So yeah - please give it a try and let me know what you think!
Observations
While building the tool and playing with it myself, I've observed some pretty cool patterns and interesting things about the embedding.
One of my favorites is "DTEZ Island" way out on its own on the far left side of the viz which consists of only DTEZ scores. It's very isolated from other scores due to how niche the DTEZ playstyle is.
Another cool spot I've found is that there seem to be two distinct "paths" from ~500pp -> 1000pp: one speed, and one aim:
If you look at the top right of the viz where all the elite plays are at and set the color mode to "aim/speed ratio", it's clear that the right side has a lot of speed/stream maps like Sidetracked Day [Daydream] +DT while the left side has mostly aim-heavy maps like PADORU / PADORU [Gift] +DT.
The fact that those two paths are separated from each other seems to indicate that elite players have end up specializing in either speed or aim once they reach that level - although there are certainly some players that break that mold.
The whole atlas also tends to arrange from low difficulty to high difficulty across the horizontal axis, which makes a lot of sense given how it was created. It's notable that all of these patterns are emergent; they arise naturally from the data itself rather than being designed or engineered.
Besides difficulty, I'd say the other attribute that has the biggest impact at a global level is release year. At a local level, mod combos (DT vs nomod vs HR, etc.) tend to be very impactful.
Technical Details
First off, the whole project is entirely open source: https://github.com/ameobea/osu-embeddings
I used the decade worth of data I've collected from my osu!track to create a big correlation matrix between beatmaps - encoding within it data about relationships between maps, mods, and more.
I'm happy to share any of the data I used to create this - just let me know.
After some pre-processing, I turn that correlation graph into an embedding using some Python libraries and then project it down into 2D with UMAP. The result is a big 2D map where each beatmap+mod is assigned a 2D coordinate. More similar beatmaps end up close to each other and vice versa.
I spent a good bit of effort tuning the various parameters of the embedding process to get an output that both looks good, is easy to interpret (doesn't have 300 circles all stacked on top of each other), and still conveys the core information about the relationships between the beatmaps.
I then dump the whole embedding along with beatmap metadata into a binary file which is downloaded by the frontend.
This whole process is handled via a series of Python notebooks.
Speaking of the frontend, that's built with SvelteKit. The embedding visualization itself is built in a pretty low-level manner with hand-written WebGL shaders for the circles, manual input handlers, hit testing, coloring, and everything else.
I use the awesome Rust `rosu` libraries by /u/badewanne3 for several features like pp simulation, computing aim/speed ratio, star difficulties with mods, and stuff like that. They were indispensable for this project.
I'm happy to answer any questions about any of these pieces as well.