The Register

Cisco used AI to write security incident reports, with mixed results

Security

You’ll need a lot of detailed prompts to get solid output – and even then it may have errors and typos

Cisco tested AI’s ability to write an accurate report on a tabletop security incident response exercise, and found that while the tech can save time, many risks remain.

The networking giant revealed its results in a Thursday blog post https://blogs.cisco.com/security/ai-generated-reporting-lessons-learned-from-talos-incident-response by Nate Pors, a senior incident commander in the Cisco Talos Incident Response team.

Pors opened by observing that when to used generate long-form technical content, large language models can deliver “significant inaccuracies, unusual conclusions, and inconsistent writing styles.”

LLMs make those mistakes because they’re essentially a fancy autocomplete system that makes educated guesses. Pors wrote that the nature of LLMs therefore sees them mess up in four ways:

One involves giving an LLM “granular, single-task instructions” that focus on “a specific, small portion of the report.” Doing so means “risk of hallucination or cross-contamination between sections is significantly reduced.” Telling an LLM which sources to use also helps. So does setting rules about the style and format of output.

MORE CONTEXT

Using those techniques, Cisco says the time required to draft an incident report based on a tabletop exercise fell by 50 percent.

“A blind test of the sample report in our quality assurance process showed no noticeable drop in overall writing quality,” Pors wrote. “The peer reviewer, professional editor, and management reviewer all made complimentary comments about the report while unaware that it was AI-generated. The peer reviewer commented that the incidence of typos and grammatical errors was far lower than in the average report.”

But the Talos team also found “editing multiple sample reports within a single session resulted in cross-contamination of content from one report’s source material to another, even if the notes used to generate the first report were deleted from the project’s reference documents.”

The researchers therefore recommend starting a new session, and re-entering prompts, for each new incident report.

They also developed a spelling-and-grammar-checking prompt that “hallucinated numerous grammar issues … failed to identify actual issues,” had a success rate below 50 percent and “would behave inconsistently, sometimes catching issues and sometimes overlooking them.

“It is currently unsuitable for production use,” Pors concluded.

Pors said Cisco concluded that its approach “could be adapted to any cybersecurity reporting use case with standardized inputs and predictable outputs,” but also warned authors must “take ownership of every word of the final report.”

“While testing, we found that the LLMs generated recommendations that were duplicative, irrelevant, or not actionable. If this were used in a production environment without manual checks, it could result in poor-quality recommendations in a final report.”

Those problems arose when considering a tabletop exercise, a far simpler affair than analysis of an incident that involves analyzing log files from multiple systems. ®

READ MORE HERE