When we enter sensitive information – our names, passwords, payment card information, medical information, what have you – into websites, we do it with the expectation that it will be kept confidential and safe and will not be misused by the company running the site.
Most tech-savvy users know that there are many ways this kind of information can end up in the wrong hands: machines infected with keyloggers, traffic interception/man-in-the-middle attacks, sniffing of unencrypted traffic over unsecure networks, etc.
But, until now, not many knew that some of this information can end up in the hands of analytics firms – some of which allow publishers to explicitly link recordings to a user’s real identity.
Imperfect redaction and unsafe delivery
Steven Englehardt, Gunes Acar, and Arvind Narayanan from the Department of Computer Science, Princeton University, analyzed the scripts of seven companies that offer “session replay” services to website owners, and found that sensitive user data keeps getting leaked to them despite many precautions.
“[Session replay] scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder,” the researchers explained.
“The stated purpose of this data collection includes gathering insights into how users interact with websites and discovering broken or confusing pages. However the extent of data collected by these services far exceeds user expectations; text typed into forms is collected before the user submits the form, and precise mouse movements are saved, all without any visual indication to the user.”
It’s true that these companies offer both manual and automatic redaction tools that should be used by publishers to exclude sensitive information from session recordings, but the researchers have found that both preventative measures are deficient.
“Automated redaction is imperfect; fields are redacted by input element type or heuristics, which may not always match the implementation used by publishers. For example, FullStory redacts credit card fields with the ‘autocomplete’ attribute set to ‘cc-number’, but will collect any credit card numbers included in forms without this attribute,” they pointed out.
The use of manual redaction tools also leaves much to be desired. “To effectively deploy these mitigations a publisher will need to actively audit every input element to determine if it contains personal data. This is complicated, error prone and costly, especially as a site or the underlying web application code changes over time.”
The result is that much sensitive information ultimately leaks to these companies, and even more so because session recording companies also collect rendered page content, which often includes some of the data users have entered.
Finally, the researchers established that (potentially malicious) third parties could get their hands on these recordings and the data in them.
“Once a session recording is complete, publishers can review it using a dashboard provided by the recording service. The publisher dashboards for Yandex, Hotjar, and Smartlook all deliver playbacks within an HTTP page, even for recordings which take place on HTTPS pages,” they noted.
“This allows an active man-in-the-middle to injecting a script into the playback page and extract all of the recording data. Worse yet, Yandex and Hotjar deliver the publisher page content over HTTP — data that was previously protected by HTTPS is now vulnerable to passive network surveillance.”
Which sites use these scripts?
The researchers identified 482 sites in the Alexa top 50,000 list using one or more of these scripts, but they say there may be more of them.
Among the ones that they named and found leaking data are pharmacy store Walgreens, clothing retailer Bonobos, and tech company Lenovo.
Following the release of this research, Walgreens and Bonobos said that they have stopped sharing data with FullStory for the time being, while their investigate the claims.
How can users prevent their data being inadvertently collected?
Setting the Do Not Track (DNT) flag in their browsers won’t help users.
When the researchers went public with these results last week, they noted that “two commonly used ad-blocking lists EasyList and EasyPrivacy do not block FullStory, Smartlook, or UserReplay scripts,” but do block Yandex, Hotjar, ClickTale and SessionCam.
In the meantime, the EasyPrivacy filter list was updated to include them, as well.