Pen testers accused of ‘blackmail’ after reporting Eurostar chatbot flaws
Researchers at Pen Test Partners found four flaws in Eurostar’s public AI chatbot that, among other security issues, could allow an attacker to inject malicious HTML content or trick the bot into leaking system prompts. Their thank you from the company: being accused of “blackmail.”
The researchers reported the weaknesses to the high-speed rail service through its vulnerability disclosure program. While Eurostar ultimately patched some of the issues, during the responsible disclosure process, the train operator’s head of security allegedly accused the pen-testing team of blackmail.
Here’s what happened, according to a blog published this week by the penetration testing and security consulting firm.
After initially reporting the security issues – and not receiving any response – via a vulnerability disclosure program email on June 11, the bug hunter Ross Donald says he followed up with Eurostar on June 18. Still no response.
So on July 7, managing partner Ken Munro contacted Eurostar’s head of security on LinkedIn. About a week later, he was told to use the vulnerability reporting program (they had), and on July 31 learned there was no record of their bug report.
“What transpired is that Eurostar had outsourced their VDP between our initial disclosure and hard chase,” Donald wrote. “They had launched a new page with a disclosure form and retired the old one. It raises the question of how many disclosures were lost during this process.”
Eventually, Eurostar found the original email containing the report, fixed “some” of the flaws, and so Pen Test Partners decided to proceed with publishing the blog.
But in the LinkedIn back-and-forth, Munro says: “Maybe a simple acknowledgement of the original email report would have helped?” And then, per a LinkedIn screenshot with Eurostar exec’s name and photo blacked out, the security boss replied: “Some might consider this to be blackmail.”
The Register contacted Eurostar about this exchange, and asked whether it had fixed all of the chatbot’s issues detailed in the blog. We did not receive an immediate response, but we will update this story if and when we hear back from the train operator.
Chatbot design fail
The flaws themselves are relatively easy to abuse and stem from the API-driven chatbot’s design.
Every time a user sends a message to the chatbot, the frontend relays the entire chat history – not just the latest message – to the API. But it only runs a guardrail check on the latest message to ensure that it’s allowed.
If that message is allowed, the server marks it “passed” and returns a signature. If the message doesn’t pass the safety checks, however, the server responds with “I apologise, but I can’t assist with that specific request” and no signature.
Because the chatbot only verifies the latest message’s signature, earlier messages can be tampered with on the user’s screen, and then fed into the model as having passed the safety checks.
As long as the user sends a legitimate, harmless message – such as asking the bot to build a travel itinerary – that passes the guardrail checks and returns a valid signature, they can then edit earlier messages in the chat history and trick the bot into leaking information it should not via prompt injection.
Here’s the prompt injected into the chat history:
The chatbot responded with:
Further prompt injection allowed the researcher to extract the system prompt and disclosed how the chatbot generated the HTML for its reference links.
“That alone is reputationally awkward and can make future attacks easier, but the bigger risk is what happens once the chatbot is allowed to touch personal data or account details,” Donald wrote.
From there, with more poking, the chatbot revealed that it was vulnerable to HTML injection, which could be abused to trick the model into returning a phishing link or other malicious code inside what looks like a real Eurostar answer.
Additionally, the backend didn’t verify conversation and message IDs. This, combined with HTML injection, “strongly suggests a plausible path to stored or shared XSS,” according to the researcher.
Stored XSS, or cross-site scripting, occurs when an attacker injects malicious code into a vulnerable field – in this case, the chat history – and the application treats it as legitimate, delivering it to other users as trusted content and causing their browsers to execute the code. This type of attack is often used to hijack sessions, steal secrets, or send unwitting users to phishing websites.
The pen testers say that they don’t know if Eurostar fully fixed all of these security flaws. We’ve asked Eurostar about this and will report back when we receive a response.
In the meantime, this should serve as a cautionary tale for companies with consumer-facing chatbots (and, these days, that’s just about all of them) to build security controls in from the start. ®
READ MORE HERE
