You can break reading difficulty down into a visual component and a non-visual component.
The non-visual component can be summarized as a measure of variation. Between every pair of objects (and every slider start and slider end) there is a minimum speed you must move your cursor to hit both objects. This speed can vary between adjacent pairs of notes, and as the difference increases it becomes more difficult to read. The same thing is true for the angles between notes; if the angle between a pair of objects is substantially different than the angle between the next pair of objects the whole pattern becomes more complex to read.
It's easy to understand if you think of the entire map as a sort of differentiable function; the function itself is just position, but you can differentiate it to get velocity, then again for acceleration, etc.. These derivatives can be aggregated in order to find the map's non-visual complexity.
Visual complexity is a measure of visual obfuscation caused by overlapping stuff; overlapping graphics, obviously, but also things like overlapping cursor paths, overlapping patterns, etc. This is what makes EZ seem difficult; overlaps become more likely as the number of objects on-screen increases. However, the real difficulty isn't in the visual obfuscation itself, but rather its effect on how much time you have to process each note, so visual complexity can come not only from the "noise" of low AR, but also the sheer speed of high AR.