Last year, Samsung employees shared confidential company data with ChatGPT . That data will now forever be part of the model’s training set. Oops! Bet Samsung wishes they had an Acceptable Use of AI Policy back then!
This “Oops!” moment is something that a lot of companies, large and small, have likely experienced in the last year without even realizing it. Why? Because employees are using Large Language Models (LLMs, also referred to as generative AI) to speed up their work and complete tasks faster, whether or not their employers told or want them to. The reason is simple: the rewards for using LLMs are too great for most employees to ignore. Generative AI is a tremendous time-saver for a wide variety of tasks, and will boost the productivity of a good team. Unfortunately, there is a considerable, inherent data security risk that comes with technology that expressly learns by absorbing user input data. While vendors of these tools have made great strides over the last year in providing users greater control over their data, it’s still on every company to take measures to protect themselves from this risk. And simply saying “Don’t use AI” is not the best answer!
Risks an Acceptable Use of AI Policy Helps Mitigate
Danger 1: Employees Leak Confidential Data into the Public LLM Dataset
The major cloud-hosted AI providers (OpenAI, Anthropic) train their models with user input. If confidential data makes its way into the hands of the model – you have failed to maintain adequate control over that information. It becomes possible for the AI to learn the information and inadvertently share it with another user.
In the Samsung incident mentioned earlier, Mashable describes three specific instances where this occurred:
“An employee shared source code to check for errors.”
“An employee shared code … and requested code optimization.”
“A third shared a recording of a meeting to convert into notes for a presentation.”
Take note of the third task especially: transcribing audio is a rote, and time-intensive task. Chances are, that employee had far more important things to be doing than transcribing a meeting. That’s a great use of AI… if only it could have been done without sacrificing confidentiality.
Good news! It can easily be done; there has been a lot of progress towards maintaining business data confidentiality in the last year.
Not all AI services will use user input to train their models. It depends on the specific vendor, the pricing tier the user is on, and settings they have chosen.
OpenAI uses all ChatGPT input from individual plan users to train their models by default. This includes confidential or private data that an employee might put into the system during the course of their workday.
However, ChatGPT has business plans: the ChatGPT Team and ChatGPT Enterprise tiers keep input out of model training by default. Plus, individual plan users can now opt-out of allowing their input to be used for training through data control settings .
(For full info, read the OpenAI Individual ToS and Enterprise Data Privacy )
OpenAI ChatGPT Pricing Tiers
Meanwhile, Anthropic makes a distinction even on its free and individual plans. They do not train on non-publicly available data. Per their terms:“Our use of Materials. We may use Materials to provide, maintain, and improve the Services and to develop other products and services. We will not train our models on any Materials that are not publicly available, except in two circumstances:
1. If you provide Feedback to us (through the Services or otherwise) regarding any Materials, we may use that Feedback in accordance with Section 5 (Feedback).
2. If your Materials are flagged for trust and safety review, we may use or analyze those Materials to improve our ability to detect and enforce Acceptable Use Policy violations, including training models for use by our trust and safety team, consistent with Anthropic’s safety mission.”
This makes Anthropic a better choice for data privacy for individuals using the tools. Anthropic does not train on paid use, either.
Of course, there are options beyond OpenAI and Anthropic – they are just covered here as the two largest options available for easy, public use. Read the terms of any model before you begin using it.
Danger 2: Chat History Leaked/Stolen in Data Breach
Even if you’re using an LLM service and plan that does not use your input to train its models, the service is still saving your data (including chat history) so that you can return to the same thread later and so the AI can remember what you’ve told it.
That data is stored on the cloud service’s servers and is prone to a data breach- whether it comes from an attack directed at the vendor or a phishing attack against your own account. The software could suffer its own bugs too, like last year when a ChatGPT bug leaked portions of chat history to other users .
It’s important to be sure you trust the vendor to properly store and protect the data you give it – and take measures to protect your own account like you would any other!
VIDEO
The information governance of these vendors must be considered. They are young companies, so it may not be as mature as you would like!
Danger 3: Incorrect information/output
“ChatGPT can make mistakes. Check important info.”
“Claude can make mistakes. Please double-check responses.”
By the big AI companies’ own admission, their models are imperfect. Even Google couldn’t keep its AI from producing factual errors in its own marketing materials.
Incorrect information could affect any incidental task your employees use AI for, but is especially relevant to research and writing tasks. One incorrect fact from the platform could throw off all of your assumptions, or have you making an embarrassing error online!
Additionally, the models are trained to be so helpful to humans that they tend to act like yes-men. If a user attempts to correct the AI while completing a task it will frequently defer to the human’s judgment, even if the human is wrong!
Danger 4: Cybersecurity Happy Talk!
There’s a specific sort of incorrect output that directly affects cybersecurity tasks: Happy Talk!
LLMs are trained on tons of data scraped from the Internet. Most companies, when posting publicly about their cybersecurity posture, paint a very rosy picture. This increases the odds that the cybersecurity-related output will be overly “happy” and not critical enough.
VIDEO
One of the way’s I’ve tested Claude’s capability is by asking the AI to assess cybersecurity documentation provided by big vendors. Claude regularly failed to recognize glaring flaws such as exceptions (negative marks) on a SOC 2 report, and would generally provide a much more positive assessment of the company’s cybersecurity posture than most CISOs would.
Note: It seems that newer versions of Claude are steadily improving in this regard. It is possible that, as training gets better, this will become less of a problem.
The Solution: An Acceptable Use of AI Policy
Controls must be put in place to protect a company from the information security threat LLMs pose. Thankfully, the solution is simple: Every company can and should introduce an Acceptable Use of AI Policy.
This can be a standalone policy, or included as part of your general Acceptable Use Policy.
In your Acceptable Use of AI Policy, you should clearly define the following:
1. Which generative AI tools employees are allowed to use.
If your organization is large enough to have a business plan with one of the tools, it’s easy to designate its usage. If you don’t have one of those plans, you can still instruct employees to use whatever set of tools you trust.
2. What data is, and is not allowed to be entered into generative AI tools.
Here are a few questions to get you started in this line of thinking: Are employees allowed to input customer data into the models? What about internal documents? Or data from vendors? Is confidential customer data off-limits, but public customer data okay? This section of the policy is critical for maintaining control of your company’s data.
3. Instructions on how to treat output generated by AI tools.
As previously discussed, generative AI models can be wrong. Remind employees that they are responsible for the quality of their deliverables, and provide instructions on how they are expected to fact-check AI output. The clearer you are here, the easier it will be for them to meet your expectations.
Have the Acceptable Use of AI Conversation with Your Employees
With your new Acceptable Use of AI Policy in hand, it’s time to ensure your organization is on board. This takes additional work, too. Here’s what we suggest:
Hold a meeting with all relevant employees to announce the new policy and cover its provisions.
Distribute the policy and have them sign off that they understand it.
Meet with managers one-to-one and ensure they understand how they are expected to hold their direct reports to the policy.
Generative AI is a game changing technology.
Companies that master its usage will be more efficient and surpass competitors who don’t bother. Individuals who learn to use AI as a force multiplier will be much more productive and have better career prospects as the technology becomes more and more important.
By understanding its risks, defining its usage, and burning good AI habits into your organization – you can ensure that you and your team will benefit from the tech for years to come.
Want to get great cybersecurity content delivered to your inbox? Click here to sign up for our monthly newsletter, Tales from the Click.