The real attack surface is no longer your prompts, it's in what your agents are allowed to do.
Recently, Microsoft launched Scout, an “always-on agent” that operates autonomously across email, calendar, OneDrive, SharePoint, and shell access — running in the background, without waiting for a conversation to start. A few months ago, Openclaw did the same.
These are not chatbots. They are a permanently open attack surface. Agentic apps are evolving to be autonomous and have expanding to be riskier.
The conversation about agentic security has been going on for quite some time but I think most are focused on the wrong layer. While researchers and engineers debate system prompt wording and sanitization filters, a more fundamental vulnerability has been sitting in plain sight, not in how models are instructed, but in what they’re allowed to do.
The core problem with always-on agents
A traditional chatbot creates a bounded window of risk: the conversation starts, something could go wrong, the conversation ends. An always-on agent that reads incoming email, browses pages to finish tasks, and queries a shared knowledge base keeps that window open indefinitely.
Agents today don’t just respond, they maintain context, fire on events, call tools in sequence, and hand off work to sub-agents. Often without a human reviewing each step.
The security model for a session-based chatbot simply doesn’t transfer to this architecture. Always-on agents require a fundamentally different security approach. Most frameworks are aware of the risks, but they transfer the onus to deployments.
Prompt injection was formally documented in 2023 as a structural vulnerability in LLM-integrated applications. The attack is conceptually simple: embed instructions inside content the model will eventually read, a document, a webpage, a database entry and steer it away from the developer’s intent entirely. The field recognized the problem quickly. What followed were three dominant defenses, all targeting the same prompt layer:
- Stricter system prompt instructions : Telling the model to ignore instructions embedded in retrieved content
- Input sanitization filters: detecting and stripping injected payloads before they reach the model
- Instruction hierarchy training: training models to treat developer-level instructions as higher-authority than retrieved content
Each of these assumes the fix lives at the prompt layer. It doesn’t. An LLM processes your system prompt and a poisoned webpage or email the same way: both arrive as tokens in the context window. There is no trust flag, no channel label, nothing that marks one as authoritative and the other as external input to be treated with suspicion.
Why hierarchy training falls short?
Instruction hierarchy training reduces attack success rates, but it doesn’t eliminate the attack surface. The model still processes untrusted text. It can still be manipulated by it, especially through well-crafted indirect injections. Reducing risk isn’t the same as removing it.
Prompt injection is not a bug that can be patched. It is a property of how these systems work.
It has sat at the top of the OWASP LLM Top 10 since the list launched, not as a known-and-solved risk, but as a known-and-persistent one. The taxonomy from the 2023 foundational research documented six threat categories, four injection methods, and three classes of affected parties. That breadth alone signals that no single prompt-layer fix could ever be sufficient. The attack surface is not a single vulnerability. It is a structural property of how LLMs process retrieved content.
OK, so what actually needs to change?
If the vulnerability lives at the deployment layer, that’s where the defenses need to live too. The questions worth asking aren’t about prompt wording, they’re about architecture:
What is the agent’s behavior? What data can it access? What actions can it take without confirmation? What known security holes does its codebase include? Has the principle of least privilege, long established in traditional software security, been applied to AI agent deployments?
An agent that can read your inbox, browse external pages, and execute shell commands is an agent with a very large blast radius. Constraining that radius, through scoped permissions, confirmation requirements for high-stakes actions, and hard limits on tool access, is what security at this layer actually looks like.
The tool layer is where agents get compromised and the right method to discover potential vulnerabilities is to start with examining your codebase. Yes, that means static analysis first and building enough intelligence to create red teaming attacks.