Enhancing DLP With Natural Language Understanding for Better Email Security

March 16, 2022 TH Author

Email security startup Armorblox’s new Advanced Data Loss Prevention service highlights how the power of artificial intelligence can be harnessed to protect enterprise communications such as email.

One of the most intriguing areas of artificial intelligence research focuses on how machines can work with natural language – the language used by humans – instead of constructed (programming) languages like Java, C, or Rust. Natural language processing focuses on machines being able to take in language as input and transform it into a standard structure in order to derive information. Natural language understanding – which is what Armorblox incorporated into its platform – refers to interpreting the language and identifying context, intent, and sentiment being expressed. For example, NLP will take the sentence, “Please crack the windows, the car is getting hot,” as a request to literally crack the windows, while NLU will infer the request is actually about opening the window.

Natural language models are fairly mature and are already being used in various security use cases, especially in detection and prevention, says Will Lin, Managing Director at Forgepoint Capital. NLP/NLU is especially well-suited to help defenders figure out what they have in the corporate environment.

“Organizations and their customers create so much data, NLP/NLU is invaluable in helping a company understand where a company’s riskiest data is, how it is flowing throughout the organization and in building controls to prevent misuse,” Lin says.

NLU in Corporate Email

NLU is well-suited for scanning enterprise email to detect and filter out spam and other malicious content, as each message contains all the context needed to infer malicious intent. The metadata of the message, such as the IP address and domain it was sent from and who it was sent to, combined with the content in the body and attachments provide signals to assess whether a message is good or not, says Anand Raghavan, co-founder and CPO of Armorblox. And thanks to the fact that Google and Microsoft already scan email messages to filter out spam and potential phishing messages in their email services, people are much more comfortable with the idea of machines reading their messages to filter out bad ones.

“When was the last time you looked your spam box in Gmail?” Raghavan asks, noting that people trust that malicious messages have been removed.

Beyond spam, NLU could be useful at scale for parsing email messages used in business-email-compromise scams, says Ferdinand Montenegro, senior principal analyst at Omdia. Email-based phishing attacks account for 90% of data breaches, so security teams are looking at ways to filter out those messages from even reaching the user.

While security awareness training is valuable, it is not realistic to rely on training alone to get to 100% security. Human beings have authority bias, where they are more inclined to believe something is true if it comes from someone in a position of power or authority, such as a boss. Machines are less susceptible and would be able to see other signs that the message instructing the recipient to initiate a wire transfer to a new destination may not actually be coming from the CEO.

Even with multiple trainings, there is always going to be that small subset of users who will click on the link in an email, or think a fraudulent message is actually legitimate. Raghavan cites a recent report by insurance provider AIG, the most common cybersecurity-related claim was BEC scams.

“You can’t train that last 14% to not click,” Raghavan says, which is why technology is necessary to make sure those messages aren’t even in the inbox for the user to see.

Another variation involves attacks where the email address of a known supplier or vendor is compromised in order to send the company an invoice. As far as the recipient is concerned, this is a known and legitimate contact, and it is not uncommon that payment instructions will change. The recipient will pay the invoice, not knowing that the funds are going somewhere else. There is not much that training alone can do to detect this kind of fraudulent message. It will be difficult for technology to identify these messages without NLU, Raghavan says.

NLU in DLP

Armorblox’s new Advanced Data Loss Prevention service uses NLU to protect organizations against accidental and malicious leaks of sensitive data, Raghavan says. Armorblox analyzes email content and attachments to identify examples of sensitive information leaving the enterprise via email channels.

Compared to legacy DLP, using NLU reduces false positives by a factor of 10, Armorblox claims.

DLP is pretty straightforward, as it looks for key information that may be sent to unauthorized recipients. Traditionally, DLP relies on static rules and regular expressions (regex), such as looking for specific keywords (such as the codename of a top-secret project) or looking for strings of nine numbers (which would look like Social Security numbers).

However, they aren’t enough, as some things may be missed. If the sender is being very careful to not use the codename, then legacy DLP won’t detect that message. It is inefficient—and time-consuming—to expect the security team to constantly keep coming up with rules to catch every possible combination. Or the rules may be such that messages that don’t contain sensitive content are also being flagged. If the DLP is configured to flag every message containing nine-digit strings, that means every message with a Zoom meeting link, Raghavan notes.

Understanding the content of the messages is key, which is why NLU is a natural fit for DLP, Raghavan says. Using NLU also means the DLP engine doesn’t need to be manually updated with newer rules. Policies are constantly updated as the engine learns from the messages that come in.

NLU in Cybersecurity

Raghavan says Armorblox is looking at expanding beyond email to look at other types of corporate messaging platforms, such as Slack. However, NLU – and NLP – also has possibilities outside of email and communications. Classifying data objects at cloud scale is a natural use case that powers many incident response and compliance workflows, Lin says. Two of Forgepoint Capital’s portfolio companies – Symmetry Systems and DeepSee – are applying NLP models to help build classifiers and knowledge graphs.

NLU can also be used to parse vulnerability descriptions in disclosure or bug reports and potentially help optimize operations to be better at interpreting requests, Montengegro says.

Both Lin and Montenegro stress that there is a lot of work that needs to be done before NLU (and NLP) becomes commonplace in cybersecurity.

“That said NLU is HARD. Stray too far off use cases and I think things break down,” Montenegro says.

Leave a Reply Cancel reply