A vulnerability in the redis-py open-source library was at the root of last week’s ChatGPT data leak, OpenAI has confirmed.
Not only were some ChatGPT users able to see what other users have been using the AI chatbot for, but limited personal and billing information ended up getting revealed, as well.
How did the ChatGPT data leak happen?
ChatGPT suffered an outage on March 20 and then problems with making conversation history accessible to users.
But it turned out to be an even more serious problem:
“During a nine-hour window on March 20, 2023, another ChatGPT user may have inadvertently seen your billing information when clicking on their own ‘Manage Subscription’ page,” OpenAI notified 1.2% of the ChatGPT Plus subscribers via email.
“The billing information another user might have seen consisted of your first and last name, billing address, credit card type, credit card expiration date, and the last four digits of your credit card. The information did not include your full credit card number, and we have no evidence that any customer information was viewed by more than one other ChatGPT user.”
In regards to the leaked chat history, the good news is that only the titles of the conversation prompts were accessible.
The internal investigation OpenAI mounted pointed to a bug in the Redis client open-source library redis-py.
As the company explain, they use Redis to cache user information in their server, Redis Cluster to distribute this load over multiple Redis instances, and the redis-py library to interface with Redis from their Python server, which runs with Asyncio.
“The library maintains a shared pool of connections between the server and the cluster, and recycles a connection to be used for another request once done. When using Asyncio, requests and responses with redis-py behave as two queues: the caller pushes a request onto the incoming queue, and will pop a response from the outgoing queue, and then return the connection to the pool. If a request is canceled after the request is pushed onto the incoming queue, but before the response popped from the outgoing queue, we see our bug: the connection thus becomes corrupted and the next response that’s dequeued for an unrelated request can receive data left behind in the connection,” they noted.
Unfortunately, that Monday they made a change to their server that caused a spike in Redis request cancellations, so many connections ended up returning bad data.
Fixing the problem
The bug has since been patched, and OpenAI has added checks to make sure requesting users don’t get data belonging to other users. Then they trawled their logs to make sure the unwanted behavious stopped and to identify affected users.
Finally, they say, they’ve improved the robustness and scale of their Redis cluster to reduce the likelihood of connection errors at extreme load – a wise course of action given ChatGPT’s huge popularity.
It has been estimated that the AI chatbot has reached 100 million monthly active users in January 2023, a mere two months after its launch.
It’s popular with both consumers and businesses, though the latter should ensure it and OpenAI undergo the same third-party risk management process as any other application.