here's a quick fix for the network latency issue
for the person moving the mouse, (although i wish there was) there'll be no hitsounds, because he ain't hiting anything.
for the person pressing the keys, his song would be delayed compared to the other person's by twice the average latency (thus if his ping is 200ms, the song and mouse cursor would be delayed 400ms behind the ACTUAL move movement of the other guy) should there be a temporary break in network connection, have auto take over for the time there's no incoming movement packets.
as for the handling the timing of movement packets, to prevent dependency of "starting at the same time", each hit/cursor movement packet's timing is identified by how long since the start of the song was it generated.
at the end of the song the clients transfer and merge each of their respective components, resimulate the game to generate the score and display the scoreboard (live score updating not required as you can't see the scores/accuracy in relax/auto)
players will only realise the delay if:
there is a packet loss and autocursor hits a note the actual cursor player has missed, and they were counting the score,
or they have a method of communication with lower latency than their internet connection (phone or sitting right next to each other, however if they're using skype or something they won't notice)