With BlindBox, you can use Large Language Models without any intermediary or model owner seeing the data sent to the models. This type of solution is critical today, as the newfound ease-of-use of generative AI (GPT4, MidJourney, GitHub Copilot…) is already revolutionizing the tech industry.
It’s fun (and somewhat existentially terrifying) to use ChatGPT. It’s so easy to use the tool that you can feel both amazed by the future possibilities and worried at how many unpredictable consequences will result from those changes.
In a few months, ChatGPT, along with other Large Language Models (LLMs) like Whisper, MidJourney, GPT4, Bard AutoGPT, GitHub Copilot, and many others, have already become the assistants of our most boring tasks and are infiltrating the heart of companies.
But if anyone can use them… anyone can freely share their data with them. Which, as it turns out, is already becoming a cybersecurity nightmare for companies.
The big difference between the GPT4 model (the better-performing version of the model powering ChatGPT) and the Google Search Bar is that GPT4 requires much more data from companies and other institutions to leverage its full capabilities. Depending on what you want to use it for, you could need to send it the code of software that is being developed, a classified government report, a customer database with personal data.
Mithril Security has raised €1.2 million to develop a solution to this problem: BlindBox, an open-source tool to protect data privacy when using LLMs. The pre-seed round was led by CyberImpact (a French fund dedicated to cybersecurity), Polytechnique Venture (a French fund for deep tech), and ITFarm (a Japanese fund based in Silicon Valley that invested in Zoom). The UC Berkeley incubator also invested when Mithril Security joined their program.
“We aim to create a reference solution for the security of data analyzed by third-party systems, similar to what HTTPS has done for the web. We want to address a global customer base and strengthen our commercial presence in both North America and Europe – which is why our CEO Daniel Huynh left a few months ago for California. We are also preparing a second round of financing for the end of the year, so we can have the means to achieve this ambitious goal,” said Raphaël Millet, COO of Mithril Security.
Focusing on ease of use
BlindBox’s goal is to protect the user of the model and the model itself, ensuring that no one can see either the data sent to the LLM or the LLM’s code base in plain sight. Our technology ensures that users can protect their confidential data and that companies can retain their intellectual property and comply with regulations.
BlindBox is a privacy deployment solution for SaaS applications (LLM providers, for example) that preserves the data confidentiality of end users, even from the software provider. To enforce that privacy, we deploy those applications with a technology often referred to as confidential computing.
Confidential computing is a cybersecurity technology that guarantees confidentiality by using runtime encryption, isolation, and integrity or authenticity checks to create highly secure environments where applications can be run in.
How does it work?
Solutions on the market already encrypt data well when it is at rest (when it is stored) and when it is in transit. But when the data is in use (for example when the LLM processes it to be able to generate a prediction), it must be in clear, at some point, for the program to analyze it. If the operations are performed locally, the level of security may be deemed acceptable as the data is not exposed to an external software vendor.
But LLM applications are usually hosted by a third party in the Cloud, which means that this data will be handled by a player that needs to be trusted. Implementing solutions to this problem is difficult because the technologies with the potential to solve these problems are complex, expensive to deploy on-premise, and not yet widespread or well-known.
To guarantee privacy with BlindBox, Mithril Security uses Confidential Computing’s Trusted Execution Environments (TEEs). They are highly isolated computing environments where data and applications can reside and operate. Data sent to them is only decrypted inside the TEE. Even if hackers or malicious individuals were to gain access to the host machine on which the TEE is running, they could not access or read the data inside of it.
Enclave security is based on isolation, encryption, and an attestation that ensures that these elements are reliable.
“I became interested in artificial intelligence in 2016 when AlphaGo’s victory completely revolutionized the way AI was perceived. I was then exposed to privacy technologies when I worked at Microsoft in 2020, and I knew I had found my calling: securing data access by AIs. ChatGPT and other generative AIs are creating a new moment of fascination with AI, providing access to an incredibly easy-to-use collection of tools,” said Daniel Huynh, CEO of Mithril Security.
“The problem is that many sectors that handle highly sensitive data, such as medical, banking, legal, and research, cannot use these tools. The risk of information leakage is too great and regulations often prevent them from doing so, and rightly so. This is why we created Mithril Security with my partners Raphaël Millet and Mehdi Bessaa. We wanted to build a solution that would allow owners of sensitive data to be able to use AI while guaranteeing the confidentiality of their data,” added Huynh.