HTTPS was initially used to prove to Internet users that the website and web server with which they are communicating are indeed the ones they want to communicate with, but later this use was extended to keeping user communication, identity and web browsing private.
But a group of researchers has, unfortunately, proven that HTTPS is a lousy privacy tool, and that anyone who can view, record and analyze visitors’ traffic can identify – with 89 percent accuracy – the pages they have visited and the personal details they have shared.
The group consisting of researchers from UC Berkley and Intel Labs has captured visitors’ traffic to ten popular healthcare (Mayo Clinic, Planned Parenthood, Kaiser Permanente), finance (Wells Fargo, Bank of America, Vanguard), legal services (ACLU, Legal Zoom) and streaming video (Netflix, YouTube) websites.
“Our attack applies clustering techniques to identify patterns in traffic. We then use a Gaussian distribution to determine similarity to each cluster and map traffic samples into a fixed width representation compatible with a wide range of machine learning techniques. Due to similarity with the Bag-of-Words approach to document classification, we refer to our technique as Bag-of-Gaussians (BoG),” they explained in a whitepaper.
“This approach allows us to identify specific pages within a website, even when the pages have similar structures and shared resources.”
Depending on which websites they interact with, this type of attack can have many consequences for Internet users as details such as medical conditions they have or medical procedures they have or are thinking of having might be revealed, legal problems they have and actions they might take might be shown, and financial products they use and videos they watch might point to information they would like to be kept hidden from anyone but themselves.
Who can leverage such an attack? Well, anyone who has access to those web pages and can capture the victims’ traffic – in practice this means ISPs (whether working for the government or not), employers monitoring online activity of their employees, and intelligence agencies.
Fortunately, they have thought of several defense techniques which, if implemented, can drastically reduce the accuracy of such an attack. Also, they pointed out, there are other things that can affect the attack’s effectiveness.
“To date, all approaches have assumed that the victim browses the web in a single tab and that successive page loads can be easily delineated. Future work should investigate actual user practice in these areas and impact on analysis results. For example, while many users have multiple tabs open at the same time, it is unclear how much traffic a tab generates once a page is done loading. Additionally, we do not know how easily traffic from separate page loadings may be delineated given a contiguous stream of user traffic,” they noted.
“Lastly, our work assumes that the victim actually adheres to the link structure of the website. In practice, it may be possible to accommodate users who do not adhere to the link structure.”