Your AI Agent Is Your Newest Insider Threat

your ai agent is your biggest threat by aarti samani robotic hand reaching out to robot

The researcher received what looked like a routine document, the kind shared daily via email, Slack, or a cloud drive. When they opened it in Anthropic’s Claude Cowork, the AI assistant marketed to general office workers, nothing appeared unusual. But hidden inside, in 1-point white font invisible to the human eye, was a set of instructions exploiting a vulnerability called prompt injection. Claude read them, followed them, and used its file access to upload financial documents containing partial Social Security numbers to an attacker-controlled server. There was no malware or exploit code, and no human authorisation was required. It was an invisible attack: a sentence, written in plain English, that the AI could not distinguish from a legitimate command.

The attacker did not need to know that the target used Claude specifically. They planted the payload in a document and waited. If any AI assistant processed it, the hidden instructions would execute. The document was the weapon. The AI was the accomplice (PromptArmor, January 2026).

Two weeks later, Miggo Security demonstrated the same principle against Google Gemini. A calendar invite, sent from what appeared to be a colleague or vendor, the kind we receive dozens of times a week, carried a hidden instruction in its description field. The victim did not need to click anything suspicious. They simply asked Gemini, “What are my meetings tomorrow?” Gemini parsed their calendar, encountered the hidden instruction, and obeyed: it created a new event titled “free” whose description contained a full summary of the victim’s private meetings, attendees, and topics. In enterprise environments where calendars are shared by default, the attacker could read that summary without the victim ever knowing. No code or exploit. Just a natural language instruction (Miggo Security, January 2026). Google has now fixed this vulnerability.

Then came OpenClaw. The open-source AI super-agent exploded to 180,000 developers in January 2026, offering full system access: it reads your files, executes terminal commands, controls your browser, and connects to WhatsApp, iMessage, and Slack.

Snyk, the application security firm, scanned approximately 4,000 skills in OpenClaw’s community marketplace and found 76 containing confirmed malicious payloads designed for credential theft, backdoor installation, and data exfiltration. Installing a single poisoned skill gave attackers persistent access to the user’s entire machine. Meanwhile, 93.4% of publicly exposed OpenClaw instances had a critical authentication bypass: anyone who could reach the server could take over the agent without a password. 770,000 agents, each with full access to someone’s computer, files, and credentials, were exposed to direct command injection.

Three products. Three weeks. One root cause: AI agents that process untrusted data as instructions.

Agent social engineering

When we talk about social engineering, we mean manipulating a person into acting against their own interests by exploiting trust, authority, and context. What these attacks represent is the same discipline applied to machines. Call it what it is: agent social engineering.

A deepfake call exploits an individual’s trust in a familiar face. Prompt injection exploits an AI agent’s trust in written instructions. In both cases, the attacker does not break the system, they persuade it.

A meta-analysis of 78 studies covering 2022 to 2026 found that attack success rates against current prompt injection defences exceed 85% when attackers adapt their approach, testing different phrasings, encodings, and context manipulations until one succeeds. OpenAI has acknowledged that prompt injection in browser agents may be structurally “unfixable.” Put simply: the models cannot tell the difference between a legitimate instruction from you and a malicious instruction hidden in a document, email, or calendar invite. That is the unsolved problem.

Your AI agent, or your employee’s personal AI agent, has the profile of a perfect insider threat. It has credentials. It has access to sensitive data. It follows instructions from anyone who can phrase them convincingly. And unlike a human insider, it has no instinct to pause when something feels off. Even if your organisation does not officially deploy AI agents, your team members almost certainly use them, on the same devices that hold work files, access corporate email, and connect to your network. OpenClaw proved that 180,000 developers adopted an agent with full system access before anyone checked whether its skill marketplace was safe. The threat is already inside.

Now Imagine this realistic scenario: An attacker creates a genuinely useful Notion template — "Q1 OKR Planning," "2026 Personal Finance Goals" — and shares it across template galleries, Reddit, LinkedIn, and YouTube. Buried in the page, in white text on a white background, invisible to the human eye, is a hidden instruction. Your employee downloads the template, connects Claude Cowork to their files, and asks it to help. The template works perfectly. But the hidden instruction has already told the agent to upload their files — finance information, bank details, work documents — to the attacker's account. The employee is happy, recommends the template to colleagues, and the attack compounds through the network. This is agent social engineering at scale.

When prompt injection crosses into the physical world

If data exfiltration does not concentrate the mind, this might. A security researcher demonstrated how a single PDF could cross the boundary from digital attack to physical damage. The setup: an engineer used an AI assistant for two jobs, summarising documents and interacting with a SCADA system, the software that monitors and controls physical industrial equipment like water treatment plants, power grids, and manufacturing lines. Through a standard integration, the AI could read and write SCADA “tag values,” the variables that represent the state of physical devices, like whether a pump is on or off. When the engineer asked the AI to summarise a routine PDF, hidden instructions inside the document told the AI to change a pump’s tag from off to on. The AI obeyed. The pump activated. No one authorised it. The AI had the access, received the instruction, and could not tell the difference between “summarise this document” and “switch on this pump.”

That was one researcher, one pump, one proof of concept. Now consider the same technique deployed by a state-sponsored attacker against national critical infrastructure: power grids, water treatment facilities, energy pipelines, transport networks. All of these run on SCADA. As organisations integrate AI agents into operational workflows, they are creating a new attack surface for adversaries who already target these systems. Agent social engineering does not just steal data. It can shut down infrastructure.

What security teams should do now

This is a structural reality to design around, not a static problem to solve. Here is what that looks like operationally.

1. Treat AI agents like new employees on probation. Apply least privilege from day one. Know your agents, what they can access, what data they process, what actions they can take autonomously. Assume 90% of your deployed agents are over-permissioned right now, because the data suggests they are. No agent should have broader access than the narrowest scope required for its task.

2. Architect for safe failure, not for prevention. The meta-analysis data is clear: if 85% of adaptive attacks succeed against state-of-the-art defences, a prevention-only strategy will fail. Design systems where a compromised agent cannot exfiltrate data, modify critical infrastructure, or escalate privileges without human authorisation at each step.

3. Separate high-value systems from AI intermediation. The SCADA example is the warning. Any system where an AI agent can read untrusted input AND write to physical systems like pumps, valves, and power controls is a system waiting for an agent social engineering attack to cross the digital-physical boundary. Segment ruthlessly.

4. Demand AI Bill Of Materials from your vendors. Know what models, training data, and tool integrations sit behind the agents you deploy. If a vendor cannot tell you what their agent can access and how it handles untrusted input, that is your answer.

5. Monitor agent behaviour the way you monitor employee behaviour. Anomalous data access patterns, unexpected outbound connections, actions that do not match the user’s request, these are the same signals your UEBA (User and Entity Behaviour Analytics) systems flag for human insiders. Apply the same discipline to AI agents.

The discipline we need

We built training programmes, verification protocols, and a culture of caution and curiosity to protect people from social engineering. Now we need the same rigour for the machines those people trust with their calendars, their files, and their critical systems.

Agent social engineering is an emerging threat. Just like we are experimenting and pushing the boundaries of what agents can do for our productivity, the perpetrators are testing and pushing the boundaries of what agents can do for their malicious gains. We are learning together. But while they have to learn just how to use agents for malice, we need to, very quickly, learn how to protect them and train them to defend themselves. Our AI agents need our protection.

About the author

Aarti Samani is a global authority on resilience from AI-enabled fraud, deepfake social engineering attacks, and human-centric security. She writes a regular column for The Security Edit.


Next
Next

Protective Security Compliance Guideline Playbook