You should know that most websites share your in-site search queries with third parties
If you are using a website’s internal search function, chances are good that your search terms are being leaked to third parties in some form, researchers with NortonLifeLock have found.
They tested 512,701 of the top 1 million sites that had internal site search, and discovered that on 81.3% of them, search terms are not kept “private”. And, what’s more, most of those sites’ privacy policies will not explicitly say that these search terms will be shared with (i.e., leaked to) third parties.
The research
By using a headless browser and finding a way to interact with sites’ search component (where present), the researchers crawled the top 1 million sites and searched for a specific term (“jellybeans”), then captured all web traffic after the search to see where the search terms were sent.
In each instance, they analyzed the URL, the Referer Request Header, and the payload, and found that 81.3% of these websites were leaking search terms to third parties either via the URL (71%), the Referer Header (75.8%), the payload (21.2%), or via more than one vector.
Then they crawled for privacy policies on those websites, collected and analyzed them, and found that only 13% of privacy policies mentioned the handling of user search terms explicitly, and 75% of them mentioning the sharing of “user information” with third parties using generic wording.
While it’s true that not that many people read privacy policies and terms of service before using websites, I believe that while many people know that Google searches are not private, they expect that the information they search for on, for example, healthcare or adult sites is somehow kept between them and the site’s owner.
“A recent study focusing on a tracking visualization tool did find that a majority of users did not want to have their search activity tracked, while a previous study found that lay people had simpler mental models than technical people – their models omitting concepts such as Internet levels and entities (suggesting that a very large number of users does not realize that their search queries are shared with third parties),” Daniel Kats, David Luz Silva, and Johann Roturier pointed out.
Possible mitigations
For many users around the world, having digital privacy is a matter of life and death.
“Users may use these search boxes to type in highly personal terms expressing racial identity, sexual or religious preferences, and medical conditions,” the researchers noted, and pointed out that prior research has shown how easy it is to de-anonymize users based on their search terms.
Some browsers have a default Referrer-Policy that prevents referrer-based leakage, and some implement tracking protection tools to flag sites that try to downgrade it and prevent the action, they noted.
There are other ways to prevent third-party leakage via the various vectors, but most of these protections are not easy to implement or can be bypassed. For example, site owers can make it so that all search components are fit into isolated iframes, which would allow browser’s Same Origin Policy to protect the search terms agains all kinds of leakage.
The researchers said that they developed a browser extension that warns users when a site leaks search terms to third parties, leaving to them the decision of whether to continue or not, but have yet to share a link to it.
UPDATE (September 10, 2022, 03:20 a.m. ET):
According to the Norton Labs team, the extension is currently research only, it has to be build from source, and it’s part of the artifacts they submitted with their research paper.