Daniel Fallmann, CEO, Mindbreeze

July 13, 2023

Combatting data governance risks of public generative AI tools

When companies utilize public generative AI tools, the models are refined on input data provided by the company. Regarding data security, unauthorized use of sensitive data or the accidental exposure of proprietary information can lead to reputational damage, legal consequences, and loss of competitive advantage.

LLMs benefits

Companies must carefully assess the data security measures implemented by AI tool providers to ensure robust safeguards are in place. Other tools should be assessed when processing sensitive company information – like insight engines or cognitive search.

But there is more to the story and many benefits in taking another approach to LLMs. Models baked in and integrated with existing knowledge management systems can offer a secure way to utilize innovative content-generating features. Integration into existing platforms enhances the benefits of LLMs and permits companies to unlock the full potential of their enterprise data in a transparent, protective, insightful, relevant, and seamless manner.

The general benefits of LLMs

Answering queries is a widespread use case we see with LLMs. LLMs can answer questions posed by humans in a human-like manner in natural language. Models can also be used worldwide because they possess language translation features that can translate texts from one language to another. Summarization is also a core benefit as it takes any long text-based content and quickly provides users with a helpful overview. Summarization can be used for any text-based content, whether it is complex tax codes, a pamphlet on a prescription drug, legal documents, or even a college history lecture.

And then, of course, there is text generation, where the models generate natural language texts based on the user’s input. “Write a paragraph about the advancements in AI over the past decade” or “Explain neural networks in a few sentences” are just two of the infinite examples. In business settings, models can generate natural language texts in many applications, including chatbots, virtual assistants in customer service, and content creation for product teams and merchandisers.

Enhancing the benefits by integrating LLMs into knowledge management platforms

In enterprise use cases, it’s not worth risking the errors of publicly available LLMs and the spread of misinformation. A big challenge for companies is ensuring that an LLM generates accurate and trustworthy content. One way to combat this risk is to connect the model to a reliable data source indexed by a knowledge management platform.

Merging pre-trained LLMs into the core of these platforms and depending on open standards like Open Neural Networks Exchange (ONNX) allows companies to use any model, whether it be their own created model or a model from communities like Hugging Face. These standards unlock further value from LLMs, and exemplify how integration enables several benefits for fundamental business objectives, such as:

Access to relevant answers

Integration enables users to obtain answers or sentences derived from enterprise data relevant to their queries. While publicly available generative AI tools permit natural language querying, world wide web data is not always applicable to the use case. Knowledge management solutions connect data from various data sources and business applications to consolidate the data into a central knowledge base.

When it comes to querying about a customer or details of a business document, this is the only way to retrieve answers based on specific company entities. Additionally, delta crawling (i.e., crawling for new data only) certifies that the model’s data is always up to date, so users aren’t receiving old and obsolete information. Generating an answer means nothing unless the answer is helpful and assists with a worker’s task in real-time.

Validation and source information

Users can validate the answers anytime because the response’s content includes source information. ChatGPT and other publicly available models, like Google Bard, do not cite where their outputs came from. So, how do you know if the content came from a reliable source versus an opinionated blog or insignificant public forum? Adding the source allows users to open the corresponding document or file and view all the details to confirm accuracy and gain further insight into their query.

Answers in context

Knowledge management platforms permit answers to be provided in the user’s context, consistently accounting for data permissions from the data source with full compliance – ensuring users receive understandable information from sources they have access rights to and personalized towards their role and responsibilities.

Local integration

Operating LLMs inside the core technology of a knowledge management system lets the model run locally on the system, eliminating the need for external calls and help requests.

Semantics similarity

By leveraging large language models and semantic similarity, the integration supports multilingual inputs and similarity recognition between different words, sentences, or documents without requiring the manual maintenance of ontologies, thesauri, or synonyms.

Fighting back against data hallucination

Data hallucination is one of the most common and harmful risks encountered with public generative AI that integration addresses. Data hallucination occurs when generative AI causes the model to produce false answers. The risk is exacerbated because inaccurate information is completely hidden behind AI systems advertised and considered trustworthy.

The main culprit is the ever-increasing amounts of data leading to poor data cleansing and a stressful overload of information, making it harder to identify false positives. Source validation, relevancy models, and using high-quality data to train benefits help enterprises fight the data hallucination battle. Relying on flawed information to make business decisions is where reputational risk, wasted resources, and poor strategy come onto the scene, making it important to use a trusted platform for your language model deployment.

While public generative AI platforms present an overwhelming risk to businesses, integrating LLMs with secure platforms of existing enterprise data realizes the mutual dream of both CISOs and end-users: a secure, trustworthy way to use models that does not risk exposure through public platforms. While LLM advances that reduce direct operational costs, such as more powerful computers, simplified training mechanisms, and more energy efficient models are developed – integration presents a real time to solution to unlock greater potential in the modern landscape.

More about