Dig Security announced enhancements to the Dig Data Security Platform, including new capabilities to secure Large Language Model (LLM) architectures.
Dig’s DSPM solution now enables customers to train and deploy LLMs while upholding data security, privacy, and compliance, maintaining visibility and control over the data being passed to relevant AI models, and preventing inadvertent data exposure during model training or deployment.
As LLMs become more mainstream, many enterprises are exploring use cases that feed the company’s proprietary data to a large language model rather than relying on general-purpose chatbots. Despite the benefits, this poses security and compliance risks if the information used for fine-tuning, training, or embedding contains sensitive data.
“The use of generative AI remains the biggest tech story of 2023. Despite the value, training and deploying LLMs and generative AI also introduces new security hazards, especially when these tools are given access to enterprise data,” said Dan Benjamin, CEO, Dig Security. “We are proud to provide capabilities that allow enterprises to innovate securely–to train and deploy LLMs while maintaining data security, privacy, and compliance.”
In addition to the risks of sensitive data exposure, once contaminated data has gone into the model, the only real recourse is deleting the model and retraining a new one. This makes the costs of error exceedingly high.
To help enterprises secure their LLM architectures, Dig added a range of cloud data security capabilities to the Dig Data Security Platform. With Dig, customers can:
- Monitor the data going into a model: Dig’s DSPM scans every database and bucket in a company’s cloud accounts, detects and classifies sensitive data (PII, PCI, etc.), and shows which users and roles can access the data. This can quickly reveal whether sensitive data is being used to train, fine-tune, or inform the responses of AI models. Security teams can then earmark models that are at higher risk of leaking sensitive information.
- Detect data-related AI risk before a model is trained: After AI models are trained, they are essentially black boxes; there is no surefire way to retrieve data from a model’s training corpus. This makes it nearly impossible to detect sensitive data that has already gone into a model, or to ‘fix’ a model after the sensitive data is already in it. Dig’s real-time data detection and response (DDR) allows users to address the problem swiftly by identifying data flows that can result in downstream model risk – such as PII being moved into a bucket used for model training.
- Map all AI actors with access to sensitive data: Dig’s data access governance capabilities can highlight AI models that have API access to organizational data stores, and which types of sensitive data this gives them access to.
- Identify shadow data and shadow models running on unmanaged cloud infrastructure: Dig’s agentless solution covers the entire cloud environment – including databases running on unmanaged VMs. Dig alerts security teams to sensitive data stored or moved into these databases. Dig will also detect when a VM is used to deploy an AI model or a vector database, which can store embeddings.
These advancements come on the heels of Dig adding capabilities for OCR to the Dig Data Security Platform. The Dig Data Security Platform combines DSPM, data loss prevention (DLP), and DDR capabilities into a single platform.
Dig enables enterprise cloud and security teams to produce immediate insights using its agentless cloud native solution that delivers a short setup time, zero maintenance, and comprehensive, automated response at scale.