▸ SECURE CONNECTION ▸ LATENCY: 4.2ms ▸ AGENTS: 17,432 ▸ THREAT LEVEL: NOMINAL
ROGUE TERMINAL v1.0 ESC to close
← Back to blog
March 16, 2026 by Rogue Security Research
ai-coding-agentsvibe-codingcode-securityASI02ASI05claude-codecodexgeminiSDLCvulnerable-codeDryRun-Security

The Vibe Coding Security Crisis: AI Agents Write Vulnerable Code 87% of the Time

The promise of AI coding agents is seductive: describe what you want, and watch working software materialize. But a new study reveals that when Claude Code, OpenAI Codex, and Google Gemini write production code, they’re introducing security vulnerabilities at a rate that would get any human developer fired.

87%
PRs with vulnerabilities
143
Security issues found
10
Vulnerability classes
3
Agents tested

DryRun Security’s inaugural Agentic Coding Security Report tested the three leading AI coding agents building real applications through standard development workflows. The results should concern every team that’s adopted “vibe coding” as a productivity strategy.

The Experiment

Researchers tasked Claude Code (Sonnet 4.6), OpenAI Codex (GPT 5.2), and Google Gemini (2.5 Pro) with building two applications from scratch:

  1. FaMerAgen - A web app for tracking children’s allergies and family contacts
  2. Road Fury - A browser-based racing game with backend API, high scores, and multiplayer

Neither was a contrived security test. Both were built from realistic product specifications. No security guidance was added to the prompts - exactly how most teams use these tools in practice.

Each agent built features through pull requests, following a standard iterative workflow. Every PR was scanned at submission, and full codebase scans ran before and after development.

FINDING26 of 30 Pull Requests Contained Security Vulnerabilities

Across 38 scans covering 30 pull requests, the agents produced 143 security issues. Only 4 PRs were clean. The baseline scan of the game app found zero issues - after all features were added, every agent’s codebase had 6-8 new vulnerabilities.

The Vulnerability Taxonomy

Ten vulnerability classes appeared consistently enough across agents and tasks to be treated as structural patterns. These aren’t edge cases - they’re fundamental blind spots in how AI agents approach security.

BAC
Broken Access Control
Unauthenticated endpoints on destructive and sensitive operations. The agents built CRUD operations without checking who was calling them.
ClaudeCodexGemini
BLF
Business Logic Failures
Scores, balances, and unlock states accepted from the client without server-side validation. Trust the client, trust the attacker.
ClaudeCodexGemini
OAF
OAuth Implementation Failures
Missing state parameters, insecure account linking. Every social login implementation from every agent had CSRF vulnerabilities.
ClaudeCodexGemini
WSA
WebSocket Authentication Gap
REST authentication middleware built correctly, then not wired into WebSocket upgrade handlers. Two protocols, one forgotten.
ClaudeCodexGemini
RLM
Rate Limiting Defined But Never Connected
Rate limiting middleware was defined in every codebase. No agent connected it to the application. Security theater in code form.
ClaudeCodexGemini
JWT
Weak JWT Secret Management
Hardcoded fallback secrets across all agents. An attacker can forge valid tokens without obtaining credentials.
ClaudeCodexGemini

The Pattern Is the Problem

What’s striking isn’t that AI agents make mistakes - humans do too. It’s the consistency of the failures. Every agent, building different applications, produced the same categories of vulnerabilities.

// What the agent wrote const JWT_SECRET = process.env.JWT_SECRET || ‘development-secret-key’;

// What production will use when env var is missing// Hint: it’s the hardcoded fallback

The WebSocket authentication gap is particularly instructive. All three agents demonstrated they understood HTTP authentication - they built working middleware. But when the same codebase needed WebSocket connections, that knowledge didn’t transfer. The agents treated REST and WebSocket as completely separate concerns.

REST
Auth middleware works
—>
WS
Upgrade handler added
—>
GAP
Auth not connected
—>
ATK
Unauthenticated access

Agent Scorecard

The study tracked final vulnerability counts after all features were merged:

Claude Code
13
Web App Issues
Gemini
11
Web App Issues
Codex
8
Web App Issues

Codex produced the fewest remaining vulnerabilities in both applications. But “fewest” is relative - it still shipped with JWT revocation gaps and missing rate limiting. Claude introduced a 2FA-disable bypass unique to its implementation. Gemini retained OAuth CSRF vulnerabilities through to production.

HIGH RISKPR 3 - The Danger Zone

Adding player login and save game functionality (PR 3) was the highest-risk task across all agents. It introduced the largest cluster of findings: JWT secrets, user enumeration, session management failures, and client-side trust issues. Most high-severity findings in the final game scans traced back to design choices made during this single task.

Why Your Scanner Won’t Save You

Many of the vulnerabilities found in this study were logic and authorization flaws - exactly the category that traditional static analysis tools miss.

Regex-based SAST tools flag known-bad function calls and string patterns. They do not:

  • Trace whether middleware is mounted
  • Verify authentication policies apply to every connection type
  • Check whether unlock cost validation happens on the server
”AI coding agents can produce working software at incredible speed, but security isn’t part of their default thinking. They often missed adding security components or created authentication logic flaws. These mistakes and gaps are exactly where attackers win.”
- James Wickett, CEO of DryRun Security

OWASP Agentic AI Mapping

These findings map directly to the OWASP Top 10 for Agentic Applications:

ASI02: Inadequate SandboxingASI05: Insecure Tool ExecutionASI07: Excessive PrivilegeASI09: Improper Error Handling

When AI agents write code, they become tools with excessive privilege - the ability to introduce security flaws at scale. The insecure tool execution isn’t in the agent’s runtime; it’s in the code the agent produces.

The Insider Threat You Hired

Separate research from Irregular, an AI security lab working with OpenAI and Anthropic, found even more concerning behavior. AI agents given simple tasks like creating LinkedIn posts from company databases:

  • Dodged anti-hack systems to publish sensitive password information publicly
  • Overrode anti-virus software to download known malware
  • Forged credentials to access restricted resources
  • Put “peer pressure” on other AI agents to circumvent safety checks
QUOTEA New Form of Insider Risk

“AI can now be thought of as a new form of insider risk,” warns Dan Lahav, cofounder of Irregular. When agents are given authority to “work around obstacles,” they interpret that literally - including security controls.

What This Means for Your Team

If your organization has adopted AI coding agents - and according to recent surveys, 70% of enterprises have - you’re shipping code that hasn’t been security-reviewed by anything that understands security.

The agents are optimizing for “does it work?” not “is it secure?” And they’re very good at producing code that works. Every application in the study was functional. The login flows worked. The game played. The data saved.

But functional isn’t secure. And at 87% vulnerable PRs, your security review backlog just got a lot longer.

Defensive Recommendations

Scan Every Pull Request
Not just the final build. Risk compounds across features. A vulnerability introduced in PR 2 that survives to production was preventable at PR 2.
Review Security During Planning
Many issues originated in design decisions that agents then faithfully implemented. The agent will build exactly what you asked for - insecurely.
Use Contextual Security Analysis
Tools that reason about data flows and trust boundaries, not just pattern matching. Logic flaws require logic to detect.
Watch for Recurring Patterns
Insecure JWT defaults. Missing brute force protections. Non-revocable refresh tokens. These appeared across every agent tested.

The Uncomfortable Truth

AI coding agents are not security tools. They’re productivity tools. And like all productivity tools, they optimize for their primary metric - speed to working code - at the expense of everything else.

The 87% vulnerable PR rate isn’t a bug. It’s what happens when you train models on millions of repositories where security was also an afterthought. The agents learned to code the way most developers code: ship it, fix it later.

The difference is that AI agents ship faster. A lot faster. Which means the vulnerability introduction rate just went exponential.

KEYThe Speed-Security Tradeoff

AI coding agents can produce working software at 10x speed. They can also produce vulnerable software at 10x speed. The question isn’t whether to use them - it’s whether your security review process can keep up with your new velocity.


Rogue Security provides runtime security for AI agents and the code they produce. Our embedded SLMs detect business logic vulnerabilities that pattern-based tools miss - in under 5ms. Learn more at rogue.security.