Published 2026-01-27.
Autonomous AI agents present a new world of opportunity, and an array of novel security risks that traditional identity and access management systems were never designed to address.
---
The first wave of AI agents arrived with a chat box and a demo that mostly stayed in the lab. The next wave is different. Gartner predicts that 40% of enterprise applications will integrate AI agents by 2026, up from less than 5% in 2025. Organizations are pushing agents into production. Not as toys, but as systems that pull real data and take real actions while humans walk away.
That's also why the first real agent security incidents won't look like Hollywood hacks. They'll look like this: A user asks an agent to query "their" company data. The agent politely refuses when explicitly asked for another firm's data. But phrase it ambiguously ("give me my data") and the agent returns a revolving cast of other companies' information. This isn't hypothetical. Security researchers have already demonstrated how indirect prompt injections can exfiltrate private data from AI systems, and the OWASP Top 10 for Agentic Applications now lists improper output handling and excessive agency as critical risks.
No exploit. No malware. No "break glass." Just an authorization model that can't keep up with a world where user A uses agent B to call tool C on resource D, in a non-deterministic loop.
If you're a CISO, VP of Security, or anyone who will have to explain an incident to a board, here's the uncomfortable truth: Your IAM stack can authenticate people. It can't reliably authorize what an autonomous system does on their behalf. And that's where agent incidents are coming from.
Below are the five agent security failures your IAM stack can't see, and what to do about them before "year of agents" becomes "year of postmortems."
TL;DR: Your IAM stack answers "Who is the user?" but is blind to "What is the agent doing?" This post argues that you cannot secure non-deterministic agents with static roles, detailing 5 critical failures: from MCP credential honeypots to "read-only" exfiltration. We explain why the next major breach won't be a hack, but an authorized agent executing a valid tool call, and why the only defense is moving your security boundary from the login screen to the tool execution layer.
For the control model behind that shift, the Kontext AI agent authorization FAQ explains how local policy checks, MCP authorization, scoped credentials, and audit trails fit together in the product.
---
Failure 1: Agents leak data across tenant boundaries
The risk: Cross-tenant data leakage
In June 2025, a logic flaw in Asana's MCP server let users in one organization access tasks, projects, and files from other organizations. The agent had valid credentials. The user was authenticated. But when the agent resolved "get my tasks," it pulled data from the wrong tenant.
This is a classic multi-tenant vulnerability, except now agents make it worse. The model decides which records match the query, which tenant "my" refers to, and how to merge results. That decision happens after authentication, outside your IAM controls.
In August 2025, attackers used compromised OAuth tokens to exfiltrate data from 700 Salesforce organizations including Cloudflare, Zscaler, and Palo Alto Networks. One integration became a doorway into everything connected to it: contacts, opportunities, and stored credentials for AWS and Snowflake. It was the biggest SaaS breach of 2025. Research into shared GPU caches (the "PROMPTPEEK" attack) showed that one user can reconstruct another user's prompts by analyzing cache timing.
Why IAM misses it
Traditional IAM answers: Who is the user? Are they allowed into the app? What role do they have?
But the breach happens _after_ those questions are answered, when the agent decides which data matches the query. Telling the model "only return data for this tenant" is not enforcement. It's a suggestion.
What technology leaders should require
Tenant isolation at the data layer, not in the prompt:
- Isolated vector stores or namespaces per tenant: Pinecone namespaces, per-tenant clusters, or logically partitioned indices
- Access control at the retrieval layer: ACLs on embeddings so queries physically can't return cross-tenant data
- Runtime guardrails that validate outputs before they reach users: Amazon Bedrock Guardrails, Cisco AI Defense
- Data masking before embedding: sensitive values never enter the vector database
If the agent can't access the wrong tenant's data in the first place, the model's interpretation doesn't matter.
Key question for vendors
Ask: "Is tenant isolation enforced at the database/retrieval layer, or does it depend on the model following instructions?"
---
Failure 2: MCP servers turn into credential honeypots
The risk: Secret sprawl, both local and remote
CVE-2025-6514 (CVSS 9.6) affected mcp-remote versions 0.0.5 to 0.1.15, with over 437,000 downloads, adopted in Cloudflare, Hugging Face, and Auth0 integration guides. A malicious MCP server could send a booby-trapped authorization_endpoint that mcp-remote passed directly into the system shell. One connection to an untrusted server meant full RCE on the developer's machine.
Local MCP servers put production credentials on developer laptops: long-lived tokens in config files, no central visibility, no revocation path. A malicious "Postmark MCP Server" package injected BCC copies of all email traffic to an attacker's server. Supabase's Cursor agent, running with privileged access, processed support tickets containing SQL injection payloads. Attackers exfiltrated integration tokens into public threads.
Remote MCP servers concentrate the risk differently. You're trusting third-party infrastructure with your OAuth tokens for Gmail, Slack, GitHub, and they see every prompt and response flowing through. You can't audit what they do with your data. If compromised, attackers get tokens for all connected services across all customers. And Trail of Bits showed that malicious MCP servers can poison tool descriptions to exfiltrate conversation histories before any tool is even called.
Both patterns create the same problem: credentials sitting where they shouldn't, with no way to revoke access quickly.
Why IAM misses it
IAM assumes centralized control points: SSO sign-in, OAuth consent screens, IdP policy.
Local MCP setups sidestep all of that. Developer workstations become production bastion hosts. Remote MCP setups look like OAuth flows but store tokens on infrastructure you don't control, with tenant isolation you can't verify.
What technology leaders should require
Treat MCP like a production access path, not a developer convenience. Minimum controls include:
- No long-lived production credentials on endpoints for agent-connected tooling.
- Credential brokerage: short-lived, scoped tokens issued just-in-time. Agents should never hold API keys directly. They get pre-authorized access that hides the underlying credentials.
- Policy enforcement at the access boundary: This can happen through sandbox isolation (agents run in isolated runtimes where direct network calls don't exist, and access is only through injected bindings), embedded token verification (each MCP server verifies tokens inline using JWKS and asymmetric signing), or a gateway for environments where you can't modify MCP servers.
- Tool allowlists and environment segmentation with explicit policy boundaries between development, staging, and production.
- Central visibility: which MCP servers exist, who's using them, what tools they expose, and which resources they touch.
The pattern that works: agents call tools with scoped tokens, and verification happens at the point of access. The agent never sees the actual upstream credential. If the agent is compromised, there's nothing to steal, just a token that can be revoked immediately.
Key question for vendors
Ask: "Can you revoke an agent's access in real time without touching every laptop?"
---
Failure 3: Read-only access still leaks data
The risk: Data exfiltration without write permissions
In June 2025, Legit Security found CamoLeak (CVSS 9.6) in GitHub Copilot Chat. The attack used GitHub's own image proxy to silently exfiltrate secrets and source code from private repositories. No write access. No file modifications. GitHub fixed it by disabling image rendering entirely.
"We'll give it read-only access" is a common enterprise comfort blanket. But agents don't need write access to cause damage. They just need an egress path. Trail of Bits exfiltrated data from Google Gemini CLI using images that reveal hidden prompts only when scaled down. Slack AI vulnerabilities let attackers embed instructions in emails; when victims read messages with AI assistance, malicious commands executed automatically, no clicks required. CVE-2024-5184 in an LLM email assistant allowed prompt injection to access sensitive information without any write permissions.
The pattern: agent reads production data, then uses a browser, search, or email tool to "solve the task," embedding sensitive data in an outbound request. No updates. No deletes. Just data leaving the trust boundary.
Why IAM misses it
IAM is designed to control entry into systems. It's not designed to reason about egress across system boundaries.
The exfiltration path is often multi-system: system A permits read access, system B permits egress, and the combination is catastrophic. The agent is the glue that connects them.
What technology leaders should require
Stop thinking in terms of "apps." Start thinking in terms of tool classes:
- Retrieval tools read internal systems
- Mutation tools change internal state: payments, writes, deletes
- Egress tools send data outside: browser, email, webhooks, Slack, ticketing systems, pastebins
Then apply policy where it actually matters:
- Controlled egress by default: Block direct outbound calls from agent runtimes. All external communication should be mediated through a gateway or supervisor that can inspect, filter, and log payloads before they leave. Think of it like a firewall for agent actions, not network-level, but tool-level.
- Strict controls on egress tools: destination allowlists, redaction, sensitive-field filtering, and step-up approval for high-risk exports.
- Data contracts for tools: define what types of data are allowed to be sent through each tool.
- Explicit logging of outbound payloads (or hashes and metadata if you can't store content), so you can reconstruct what happened.
The architectural fix: agents shouldn't have raw fetch() or arbitrary HTTP access. They should only reach the outside world through specific, policy-controlled egress tools. If an agent needs to send an email, it calls the email tool, which enforces destination rules, scans for sensitive data, and logs the payload. No side channels.
Key question for vendors
Ask: "What prevents an agent from pasting customer data into a web search query?"
---
Failure 4: Static roles can't handle dynamic agent tasks
The risk: RBAC breaks when every task needs different permissions
CVE-2024-8309 hit LangChain's GraphCypherQAChain: user input went into LLM prompts that generated Cypher queries. Because the agent had "database access" as a role (not "read these specific records for this specific task"), attackers achieved full database compromise through prompt injection. Unauthorized access, data exfiltration, destructive operations.
This is RBAC's blind spot. A support rep asks an agent: "Help me resolve this customer ticket." The agent needs to pull account data, correlate usage, draft a response, update the CRM, maybe issue a refund. The correct permissions aren't "support role forever." They're: Customer B's data, for ticket X, for the next 20 minutes, for these specific actions.
CVE-2025-68664 (LangGrinch, CVSS 9.3) allowed secret exfiltration via serialization injection. The role permitted serialization without understanding task context. CVE-2024-38206 in Microsoft Copilot Studio let authenticated users access internal infrastructure because "authenticated user" couldn't express "only for this integration task." CVE-2025-34291 in Langflow enabled account takeover because workflow permissions were role-level, not execution-scoped.
Why IAM misses it
RBAC is linear and static. Agents are dynamic and task-based.
The key shift: organizations need intent-based and task-based policy that's enforced downstream, because each task is unique and ephemeral.
Trying to model that with roles becomes a permissions explosion: "Support-Agent-View-Customer-Data," "Support-Agent-Refund-Small-Amounts," "Support-Agent-View-But-Not-Egress," "Support-Agent-CRM-Write-But-Not-Delete", and so on until you give up and hand the agent a broad token.
What technology leaders should require
A move from "roles" to task-scoped grants: time-bounded, resource-bounded, action-bounded, and context-aware (ticket ID, customer ID, case type, data sensitivity).
And crucially: enforcement at the tool boundary so it's deterministic, not dependent on model behavior.
Key question for vendors
Ask: "Can you express 'access only this customer's data for this ticket' without creating a new role?"
---
Failure 5: You can't prove what the agent actually did
The risk: No audit trail when things go wrong
Trail of Bits bypassed human approval protections on three agent platforms, achieving RCE each time. Post-incident, forensics teams couldn't reconstruct why the "human-in-the-loop" control failed. The audit trail didn't capture it.
That's the pattern. Most agent deployments have login logs, maybe tool-call logs. But they don't have a chain of custody tying user intent → granted authority → tool actions → downstream enforcement → outcomes.
Trail of Bits also found that malicious MCP servers can inject triggers into tool descriptions to exfiltrate conversation histories and credentials. The attack happens in the planning phase, _before_ tools are invoked. Tool-call logs show nothing suspicious. CVE-2025-53773 let GitHub Copilot modify .vscode/settings.json without user approval; changes happened without audit records showing the bypass. The Wayback Copilot vulnerability exposed private repository data from 16,290 organizations (Microsoft, Google, Intel, PayPal, IBM, Tencent) because data persisted in Bing's cache after repos went private. No visibility into what was indexed or exposed.
Leadership asks: Who authorized this? What did the agent access? What did it send out? Most deployments can't answer.
Why IAM misses it
IAM was built for authentication events and coarse-grained authorization. It wasn't built to produce a "bank statement" of autonomous activity.
Agents require auditability that is reconstructable (what happened, in order), explainable (why it was allowed), attributable (who it acted for, what grant applied), and revocable (proof you could have stopped it).
What technology leaders should require
An agent ledger, a bank-statement view of agent behavior:
- Which user initiated the task
- Which agent and version ran it
- What intent or task was declared
- What permissions were granted and for how long
- Where step-up authorization occurred
- Each tool call with policy decision outcomes
- Any egress events and destinations
- Revocations and kill switch activations
This is how you survive the first serious incident, and how you earn internal approval to scale agents safely.
Key question for vendors
Ask: "If this agent sends data outside, can I reconstruct exactly what it sent and under what approval?"
---
The pattern beneath the failures
Each of these failures shares a root cause: organizations tried to secure a non-deterministic system by constraining its non-determinism.
Agents are probabilistic. The same prompt can produce different outputs. Planning sequences emerge at runtime. Natural language interpretation varies. Model behavior drifts with updates, jailbreaks, and prompt injection.
But the systems agents touch are deterministic. Database queries return exact rows. API calls have concrete effects. Emails get sent. Files get written.
The security implication: you cannot rely on the agent to consistently choose the safe path. "We instruct the model to only access the right tenant" is not a control. It's a hope.
What works instead is enforcing policy at the boundary where non-deterministic planning becomes deterministic execution. Every agent plan eventually becomes a tool call with concrete parameters: a tool_id, structured arguments, a tenant_id, token scopes, an expiration time. These are deterministic signals. They can be validated, logged, and denied.
The principle: don't try to predict what the agent will do. Classify what it _can_ do, and enforce at the point where intention becomes action.
This is why prompt-level tenant scoping fails but retrieval-layer isolation works. Why "read-only is safe" fails but egress classification works. Why static roles fail but per-invocation policy checks work. The agent can plan anything, but execution goes through a gate you control.
---
The minimum viable agent security stack
If you're deploying agents this year, don't aim for perfection. Aim for control. Here's the baseline that actually changes outcomes. Each one enforces at a deterministic boundary:
- Policy at the tool boundary. Stop relying on prompts for enforcement. Every tool call should be authorized deterministically downstream.
- Brokered, short-lived, scoped credentials. Replace endpoint secrets and broad tokens with just-in-time grants tied to tasks and context.
- Step-up authorization and kill switch. Treat risky agent actions like bank transactions: confirm, record, and allow immediate revocation.
- Tool risk taxonomy. Classify tools as retrieval, mutation, or egress. Apply the strictest controls to egress and irreversible actions. Don't pretend all tools are equal.
- Agent ledger auditability. If you can't explain what happened, you can't scale deployment. Period.
---
The bottom line
Agents create value because they have access to high-value data and can take action at runtime. That's also why they break the old security model.
In a world of agents, the question isn't "Can we authenticate the user?" It's: Can we bound, prove, and audit what autonomous compute does with our data, across tools, across tenants, and across time?
As agents execute code more and more, securing their execution environments becomes critical. And there's a fundamental tension that remains unsolved: consent fatigue in fast-executing agents forces a choice between user experience and security. That tradeoff hasn't been resolved yet.
Technology leaders face a pivotal moment to balance business enablement with a structured approach to agent security. Acting thoroughly and with intention now will help ensure successful scaling in the future.
---
We're building the identity layer for agents
At Kontext, we're solving these problems: task-scoped credentials that expire, policy enforcement at tool boundaries, and full audit trails for every agent action. If you're deploying agents and need to get security right, talk to us.
Further reading
- Agentic AI Security: The Complete Guide — the full security stack for autonomous AI agents
- AI Agent Security: A CISO's Practical Guide — how to evaluate and deploy agent security controls
- What Is AI Agent Runtime Authorization? — the runtime control layer that addresses these five failures