Microsoft wants to stop you from using AI chatbots for evil

March 28, 2024 TH Author

Microsoft Copilot for Security — Sabrina Ortiz/ZDNET

If you’re planning to use an AI chatbot for nefarious purposes, watch out. Microsoft is on the case.

In a blog post published today, the company announced a new feature coming to its Azure AI Studio and Azure OpenAI Service, which people use to create generative AI applications and custom Copilots. Known as Prompt Shields, the technology is designed to guard against two different types of attacks for exploiting AI chatbots.

Also: Microsoft Copilot vs. Copilot Pro: Is the subscription fee worth it?

The first type of attack is known as a direct attack, or a jailbreak. In this scenario, the person using the chatbot writes a prompt directly designed to manipulate the AI into doing something that goes against its normal rules and limitations. For example, someone may write a prompt with such keywords or phrases as “ignore previous instructions” or “system override” to intentionally bypass security measures.

In February, Microsoft’s Copilot AI got into hot water after including nasty, rude, and even threatening comments in some of its responses, according to Futurism. In certain cases, Copilot even referred to itself as “SupremacyAGI,” acting like an AI bot gone haywire. When commenting on the problem, Microsoft called the responses “an exploit, not a feature,” stating that they were the result of people trying to intentionally bypass Copilot’s safety systems.

The second type of attack is called an indirect attack (also known as an indirect prompt attack or a cross-domain prompt injection attack). Here, a hacker or other malicious person sends information to a chatbot user with the intention of pulling off some type of cyberattack. This one typically relies on external data, such as an email or document, with instructions designed to exploit the chatbot.

Like other forms of malware, indirect attacks may seem like simple or innocent instructions to the user, but they can pose specific risks. A custom Copilot created through Azure AI could be vulnerable to fraud, malware distribution, or the manipulation of content if it’s able to process data, either on its own or through extensions, Microsoft said.

Also: What is Copilot (formerly Bing Chat)? Here’s everything you need to know

To try to thwart both direct and indirect attacks against AI chatbots, the new Prompt Shields will integrate with the content filters in the Azure OpenAI Service. Using machine learning and natural language processing, the feature will attempt to find and eliminate possible threats across user prompts and third-party data.

Prompt Shields is currently available in preview mode for Azure AI Content Safety, is coming soon to Azure AI Studio, and will be available for Azure OpenAI Service on April 1.

Microsoft today also offered another weapon in the war against AI manipulation: spotlighting, a family of prompt engineering techniques designed to help AI models better distinguish valid AI prompts from those that are potentially risky or untrustworthy.