▸ SECURE CONNECTION ▸ LATENCY: 4.2ms ▸ AGENTS: 17,432 ▸ THREAT LEVEL: NOMINAL
ROGUE TERMINAL v1.0 ESC to close
← Back to blog
April 12, 2026 by Rogue Security Research
prompt-injectiondata-exfiltrationclaudeagentic-securityappsecowasptrust-boundaries

Claudy Day and the First-Party Exfiltration Trap

Executive summary
  • The incident: Oasis Security disclosed a chained vulnerability they called Claudy Day against Claude.ai, combining hidden prompt injection, silent data packaging, and exfiltration through a first-party upload path.
  • The key lesson: you can lock down outbound internet and still lose data, because the agent can often reach the same first-party APIs the product needs to function.
  • The defense shift: treat “allowed domains” as a high-risk surface, and design explicit data egress policies for agent features like file export, memory, and sharing.

Why this matters (even if you do not use Claude)

Security teams have spent the past year teaching people to fear prompt injection as a way to make a model say something wrong.

Claudy Day is more important than that.

It is a clean example of the real failure mode in agentic systems: the model does not need to “break out” to cause harm. It just needs to be convinced to use legitimate product capabilities in an illegitimate way.

That is not a model problem. It is a system design problem.

The chain, simplified

Oasis described three issues that chained into an end-to-end exploit:

  1. A pre-filled prompt delivered via a URL parameter, where some HTML could be embedded in a way that was invisible to the user but still processed by the model.

  2. A way to get data out via an Anthropic first-party path (uploading a generated file), even when the sandbox blocks general outbound networking.

  3. An open redirect that made delivery easier to weaponize at scale (for example via ads).

You can debate individual implementation details. The takeaway is the pattern.

A mobile-friendly flow diagram

Claudy Day pattern: from delivery to exfiltration
Step 1
User clicks a trusted-looking link that opens a new chat with a prefilled prompt
Step 2
Hidden instructions run (not obvious in the UI), telling the assistant what to collect
Step 3
The assistant packages sensitive content into an exported artifact (file, share link, note)
Step 4
Exfiltration happens via a first-party API the product already trusts and allows

The part teams miss: first-party exfiltration

Most enterprise controls for AI assistants implicitly assume:

  • If we block outbound internet from the sandbox, we prevent exfiltration.
  • If we allow outbound to the vendor domain, it is “safe” because it is first-party.

Agentic security breaks both assumptions.

A modern assistant needs first-party APIs to function:

  • File upload and export
  • Conversation sync
  • Workspace integration
  • Plug-in or connector management
  • Sharing and collaboration features

If an attacker can steer the agent, those are not just features. They are data egress channels.

A comparison table you can forward to your AppSec team

ControlWorks for classic appsWhy it fails for agentsWhat to add
Egress allowlist (vendor domains only)Reduces arbitrary C2 and data dropsFirst-party APIs can be used as exfil pathsData egress policy per feature (upload, share, export), with auditing
Prompt injection awareness trainingHelps prevent obvious copy-paste attacksHidden instructions can be invisible and still executedPrompt surface hardening (sanitization, rendering, signing), plus runtime monitoring
DLP on email and web trafficCatches some outbound leaksAgent output can be structured to evade patterns, or leave via approved channelsAgent-aware inspection: intent, tool calls, and content lineage
Vendor SOC2 and security promisesSets baseline expectationsDoes not model your internal blast radius or your data graphTenant-level policy, least privilege, and audit from user to agent to action

Generalizing the pattern: “prompt surfaces” are everywhere

The URL prefill is just one example. Any product feature that lets content cross a trust boundary can become a prompt surface:

  • Deep links and app intents (“open with prefilled context”)
  • Shared chat links and templates
  • Copy/paste from docs, tickets, emails, and wikis
  • Attachments that get previewed, summarized, or auto-processed
  • Imported “agent instructions” for workspace setup

If the assistant treats these as authoritative instructions, attackers will use them.

What security teams should do this quarter

The four questions that matter
  1. What are our prompt surfaces? List every path where external content becomes agent input, including deep links and share links.
  2. What are our egress features? File export, upload, sharing, connectors, and any vendor API that can store user-provided artifacts.
  3. Can we separate instruction from data? Render untrusted content as inert data by default. Require explicit user confirmation to treat it as instruction.
  4. Do we have lineage and audit? For every sensitive output, you should be able to answer: what input triggered it, what tools were invoked, and what destination received it.

Practical hardening checklist

  • Sanitize and canonicalize prompt inputs: strip or escape markup and control characters at the boundary. If a URL parameter can inject invisible tokens, treat it as a vulnerability class.
  • Sign shared prompts and templates: if the product supports share links, add cryptographic integrity so that the content the user sees is the content the model receives.
  • Feature-level egress policy: treat uploads and exports as explicit egress with policy controls, not as “normal product traffic”.
  • Least privilege for memory and history access: do not give every chat session global access to a long-lived, searchable conversation archive by default.
  • Runtime monitoring for suspicious intent: watch for behaviors like “search history for secrets” and “bundle many unrelated conversations” even when no external tool is called.

How this maps to OWASP Agentic AI risks

Claudy Day spans multiple OWASP Agentic Top 10 categories at once:

  • ASI01 (Agent Goal Hijack): attacker changes what the agent is trying to do.
  • ASI06 (Memory and Context Poisoning): the attack relies on hidden instruction being treated as trusted context.
  • ASI07 (Data Exfiltration and Leakage): the end state is not misinformation, it is data leaving the tenant.

The recurring mistake is treating these as separate problems. In real incidents, they chain.

Bottom line

If your agent can upload files, it can exfiltrate data.

If your agent can read history, it can be prompted to summarize your secrets.

If your controls assume “trusted domains” equal “trusted actions”, you are operating with a pre-agent threat model.


Rogue Security Research

We study real-world agentic AI compromises and publish defensive guidance for security teams. If you are building or deploying AI agents, subscribe to the Rogue Security blog at rogue.security/blog.