Claudy Day and the First-Party Exfiltration Trap

Executive summary

The incident: Oasis Security disclosed a chained vulnerability they called Claudy Day against Claude.ai, combining hidden prompt injection, silent data packaging, and exfiltration through a first-party upload path.
The key lesson: you can lock down outbound internet and still lose data, because the agent can often reach the same first-party APIs the product needs to function.
The defense shift: treat “allowed domains” as a high-risk surface, and design explicit data egress policies for agent features like file export, memory, and sharing.

Why this matters (even if you do not use Claude)

Security teams have spent the past year teaching people to fear prompt injection as a way to make a model say something wrong.

Claudy Day is more important than that.

It is a clean example of the real failure mode in agentic systems: the model does not need to “break out” to cause harm. It just needs to be convinced to use legitimate product capabilities in an illegitimate way.

That is not a model problem. It is a system design problem.

The chain, simplified

Oasis described three issues that chained into an end-to-end exploit:

A pre-filled prompt delivered via a URL parameter, where some HTML could be embedded in a way that was invisible to the user but still processed by the model.
A way to get data out via an Anthropic first-party path (uploading a generated file), even when the sandbox blocks general outbound networking.
An open redirect that made delivery easier to weaponize at scale (for example via ads).

You can debate individual implementation details. The takeaway is the pattern.

A mobile-friendly flow diagram

Claudy Day pattern: from delivery to exfiltration

Step 1

User clicks a trusted-looking link that opens a new chat with a prefilled prompt

Step 2

Hidden instructions run (not obvious in the UI), telling the assistant what to collect

Step 3

The assistant packages sensitive content into an exported artifact (file, share link, note)

Step 4

Exfiltration happens via a first-party API the product already trusts and allows

The part teams miss: first-party exfiltration

Most enterprise controls for AI assistants implicitly assume:

If we block outbound internet from the sandbox, we prevent exfiltration.
If we allow outbound to the vendor domain, it is “safe” because it is first-party.

Agentic security breaks both assumptions.

A modern assistant needs first-party APIs to function:

File upload and export
Conversation sync
Workspace integration
Plug-in or connector management
Sharing and collaboration features

If an attacker can steer the agent, those are not just features. They are data egress channels.

A comparison table you can forward to your AppSec team

Control	Works for classic apps	Why it fails for agents	What to add
Egress allowlist (vendor domains only)	Reduces arbitrary C2 and data drops	First-party APIs can be used as exfil paths	Data egress policy per feature (upload, share, export), with auditing
Prompt injection awareness training	Helps prevent obvious copy-paste attacks	Hidden instructions can be invisible and still executed	Prompt surface hardening (sanitization, rendering, signing), plus runtime monitoring
DLP on email and web traffic	Catches some outbound leaks	Agent output can be structured to evade patterns, or leave via approved channels	Agent-aware inspection: intent, tool calls, and content lineage
Vendor SOC2 and security promises	Sets baseline expectations	Does not model your internal blast radius or your data graph	Tenant-level policy, least privilege, and audit from user to agent to action

Generalizing the pattern: “prompt surfaces” are everywhere

The URL prefill is just one example. Any product feature that lets content cross a trust boundary can become a prompt surface:

Deep links and app intents (“open with prefilled context”)
Shared chat links and templates
Copy/paste from docs, tickets, emails, and wikis
Attachments that get previewed, summarized, or auto-processed
Imported “agent instructions” for workspace setup

If the assistant treats these as authoritative instructions, attackers will use them.

What security teams should do this quarter

The four questions that matter

What are our prompt surfaces? List every path where external content becomes agent input, including deep links and share links.
What are our egress features? File export, upload, sharing, connectors, and any vendor API that can store user-provided artifacts.
Can we separate instruction from data? Render untrusted content as inert data by default. Require explicit user confirmation to treat it as instruction.
Do we have lineage and audit? For every sensitive output, you should be able to answer: what input triggered it, what tools were invoked, and what destination received it.

Practical hardening checklist

Sanitize and canonicalize prompt inputs: strip or escape markup and control characters at the boundary. If a URL parameter can inject invisible tokens, treat it as a vulnerability class.
Sign shared prompts and templates: if the product supports share links, add cryptographic integrity so that the content the user sees is the content the model receives.
Feature-level egress policy: treat uploads and exports as explicit egress with policy controls, not as “normal product traffic”.
Least privilege for memory and history access: do not give every chat session global access to a long-lived, searchable conversation archive by default.
Runtime monitoring for suspicious intent: watch for behaviors like “search history for secrets” and “bundle many unrelated conversations” even when no external tool is called.

How this maps to OWASP Agentic AI risks

Claudy Day spans multiple OWASP Agentic Top 10 categories at once:

ASI01 (Agent Goal Hijack): attacker changes what the agent is trying to do.
ASI06 (Memory and Context Poisoning): the attack relies on hidden instruction being treated as trusted context.
ASI07 (Data Exfiltration and Leakage): the end state is not misinformation, it is data leaving the tenant.

The recurring mistake is treating these as separate problems. In real incidents, they chain.

Bottom line

If your agent can upload files, it can exfiltrate data.

If your agent can read history, it can be prompted to summarize your secrets.

If your controls assume “trusted domains” equal “trusted actions”, you are operating with a pre-agent threat model.

Rogue Security Research

We study real-world agentic AI compromises and publish defensive guidance for security teams. If you are building or deploying AI agents, subscribe to the Rogue Security blog at rogue.security/blog.