41 more posts left until the end…
Firstly, there’s a pretty good chance that an attacker can connect to the SSL/TLS encrypted website site in question and see what the HTTP response headers look like. Minus cookies, URL and POST data, an attacker can get a pretty accurate picture of what the HTTP response looks like. The attacker can also identify what sort of key exchange the user will be using with the site in question through a little enumeration. So the amount of data sent on the wire is smaller, and the data that is sent can be isolated to the few unknown components.
So by forcing the user’s browser to pre-cache the content, an attacker can get down to just the pages they are interested in and a few GET requests that return 304 Not Modified responses. That’s a much smaller footprint for the unrelated data than it would be if it weren’t cached. Now, it may not always be a good idea to pre-cache. Sometimes the content will be hosted on other subdomains or domains, and therefore won’t create the same amount of chatter over the socket, because it isn’t pulling that content from the same IP. Other times it may be useful to detect that a user is on a certain page, because some of the content is a very specific to that page in question and is a known size - alerting the attacker to the fact that the user being monitored is on the page in question.
In this way an attacker is really getting down to the exact parts of the data they are interested in. Obviously the earlier an attacker can do this the better - trying to cache after the fact doesn’t make a lot of sense, although using timing attacks an attacker may be able to tell where the user has been, interestingly enough (Chris Evans did a good writeup on this a while back).