Cloudflare wants to put a firewall in front of your LLM

March 5, 2024 TH Author

Cloudflare has tweaked its web application firewall (WAF) to add protections for applications using large language models.

The service, dubbed “Firewall for AI,” is available to the cloud and security provider’s Application Security Advanced enterprise customers. At launch, it includes two capabilities: Advanced Rate Limiting, and Sensitive Data Detection.

Advanced Rate Limiting allows the customer to create a policy that sets a maximum rate of requests performed by an individual IP address or API key during a session. Doing so helps to prevent distributed denial of service (DDoS) attacks against the model, or other situations that would overwhelm the LLM with requests and disrupt its ability to process legitimate requests.

The second feature, Sensitive Data Detection, prevents LLMs from leaking confidential data in responses to queries. It also allows customers to set WAF rules that scan for financial information like credit card numbers, and secrets such as API keys, to ensure that these sensitive details don’t end up in an LLM’s responses.

Sadly, there’s not yet a firewall rule to prevent chatty models from emitting bad or made-up info.

In the future, customers will be able to “create their own custom fingerprints” and tailor what information the models can – and cannot – disclose, according to Daniele Molteni, Cloudflare’s group product manager, who announced the Firewall for AI on Monday.

Customers can find both Advanced Rate Limiting and Sensitive Data Detection in the Cloudflare dashboard’s WAF section.

In coming months, Cloudflare plans to test a beta version of a prompt validation feature. This will help prevent prompt injection attacks that see users designs prompts to jump guardrails intended to prevent LLMs creating inappropriate or illegal content.

The feature – currently under development – will analyze every prompt and rate it with a score indicating potential as an attack on LLMs. It will also tag the prompt based on predefined categories. The score ranges from 1 to 99, indicating the likelihood of a prompt injection attack, with lower numbers suggesting likely malicious intent.

Customers can then create a WAF rule to block or allow requests based on their score, and they can combine this score with other metrics – a bot or attack score, for example – to determine whether to the prompt reaches the LLM.

Plus, it will also allow customers to block specific topics based on prompts it deems offensive – or those about sensitive topics such as religion, sex or politics, we’re told.

The firewall can be deployed in front of any LLM, Molteni told The Register. This includes the well-known, public LLM tools like OpenAI’s GPT and Anthropic’s Claude, or private LLMs designed for in-house use. Models sold to customers as part of a product or service are also covered.

According to Molteni, “Firewall for AI can be deployed in front of any model, whether they are hosted on Cloudflare Workers AI or on other platforms or hosting providers,” the only proviso being that “the request [and] response is proxied through Cloudflare.”

Cloudflare’s focus on AI security follows a series of LLM missteps and security snafus. As tech giants push to embed LLMs in many of their products and services, the results have been often included errors and fabrications – and sometimes even potentially vulnerable code.

To address these emerging security issues, some developers are adopting an approach to security tailored to AI (Cloudflare calls its framework Defensive AI), while Google and others have expanded their bug bounty programs to include AI products and LLM attacks.

Considering how both the technology itself and the strategies to protect it against attacks are still in their infancy, there’s sure to be a lot more noise and hype around AI and LLM security – especially as we approach the season in which big vendors stage their flagship conventions. Expect a lot of talk on this at RSA and Black Hat. ®