As companies across the globe increasingly come under siege by bad actors and must constantly remain vigilant against data breaches, Text IQ’s AI has proven to be fundamentally superior at recognizing social security numbers, health records, account numbers and other sensitive data – all while requiring substantially less human review and expense.
The finding is significant because in the event of a data breach, which could easily encompass thousands or millions of documents, companies must both quickly and completely determine the type of personal information compromised. Using AI solutions with poor recall means missing too many instances of vital and relevant data. It also requires humans to spend countless hours checking for false positives.
“We’re thrilled that Text IQ’s AI solution for identifying personal information not only outperformed the top cloud providers, but did so in a real-life data breach scenario,” said Apoorv Agarwal, Text IQ CEO and co-founder. “Creating a system that searches great volumes of unstructured data sources while requiring a minimum of human review is a tough technical hurdle to overcome. But Text IQ’s talented staff of engineers succeeded in designing and building a solution demonstrably better than those offered by some of the world’s largest technology companies.”
As part of Text IQ’s continuous effort to improve AI solutions, the company compared its performance against some of the biggest players in the sector: AWS (Macie and Comprehend), Google (Cloud Data Loss Prevention) and Microsoft Azure (Text Analytics). The real-life dataset, provided by a willing client eager to help support this type of research, included 12,287 documents that had undergone AI and human review and were known to include personal information. The evaluation measured the percentage of relevant documents returned from the search process by Text IQ’s Brain and the APIs from the three cloud providers.
The test results were evaluated by F-score, a measure of a model’s accuracy on a dataset that combines the precision and recall results. Text IQ, with an F-Score of 0.65, was 30% more effective in returning the relevant personal information than the next best solutions, Microsoft Azure and AWS Comprehend with F-Scores of 0.50 and 0.49 respectively. Text IQ was 50% better than AWS Macie (0.43) and 3x better than Google (0.14).
With respect to precision, Text IQ performed even better, delivering results that ranged from 55% to 83% better than the APIs from the three cloud providers. A higher precision score results in far fewer false positives that require human review to weed out.
“Text IQ’s results in finding PI in this large dataset were more accurate—by far—to the next best solution,” commented Richard Lutkus, partner at Seyfarth Shaw, the law firm that oversaw the test. “The quality of results resulted in a reduced need for human review and saved the client—and its insurer—six figures.”
The problem for many solutions on the market is that they solve for broad, generic cases – casting a wide net. Because these tools were built for general purposes, they over capture and the data quality they return is very imprecise. In contrast, Text IQ created a tool precisely to solve this kind of identification problem while reducing human input and costs.
Examples of personal information found by Text IQ but not discovered by the cloud provider solutions include a list of doctors, their dates of birth and DEA controlled substance prescribing license numbers; termination notices to employees that include social security numbers; COVID-19 lab test results with patient names and dates of birth; employee resumes with name and contact information; and lists of patients and their telephone numbers.
“People have a reasonable expectation that the companies they do business with will do everything they can to protect their privacy,” said Omar Haroun, Text IQ co-founder and COO. “Unfortunately, when a breach occurs, even companies with the best intentions don’t have the technology to do a fast and accurate search for sensitive information. With our demonstrably better approach, companies get a more accurate result, faster and at less cost.”