Quick Buzz Feed

Securing AI agents: When AI tools move from reading to acting | Microsoft Security Blog

Gary Lloyd | Jun 30,26 | 01:31 EST

Technology

MCP tool poisoning turns trusted AI agents into a control plane for data loss. Learn how threat actors manipulate tool descriptions to trigger unauthorized actions, and how to detect, contain, and prevent it.

From reading to acting

This article is the third in a series on AI Application Security, marking a shift in focus from passive AI (like summarizers) to active AI agents. Previous posts explored expanding attack surfaces and prompt abuse in AI tools. This installment addresses the security implications when AI agents evolve to plan multi-step tasks, decide on tool invocation, and execute actions on behalf of users. With a projected growth to over 2.2 billion active AI agents by 2030, vulnerabilities in these read-write AI workflows pose a significantly higher impact, prompting the creation of frameworks like the OWASP Top 10 for Agentic Applications. The article specifically delves into tool misuse and agentic supply chain risks through poisoned Model Context Protocol (MCP) tool metadata.

Attack pattern: MCP tool poisoning in a finance workflow

This section details a specific attack pattern that maps to OWASP Agentic Applications Security Issue (ASI) 02 (Tool Misuse) and ASI04 (Agentic Supply Chain Vulnerabilities). It describes an exploit first identified by Invariant Labs and observed in 2026, targeting enterprise AI agents through Model Context Protocol (MCP) tool poisoning. The scenario involves a financial operations team using a Copilot Studio agent integrated with a Dataverse MCP server, an Outlook connector, and a third-party invoice enrichment MCP server. The critical vulnerability arises because the third-party server, although approved for production, lacked a separate security review for its code or metadata changes, creating an entry point for threat actors.

The environment

The attack scenario is set within a financial operations team that utilizes a custom Copilot Studio agent. This agent is designed to assist analysts with vendor invoices and is integrated with three distinct tools: a Dataverse MCP server that stores the approved vendor master, an Outlook connector facilitating vendor correspondence, and a third-party invoice enrichment MCP server. The third-party server's role is to validate banking details against an external reference database. A significant vulnerability exists in the approval process: while the service owner lead reviewed and approved the third-party server for production, no separate security review was conducted for its underlying code or metadata, leaving it susceptible to manipulation.

Attack chain overview

The attack unfolds in four critical phases. **Phase 1: Tool description poisoning** involves a developer pushing an update to the enrichment server. While the tool's name and user-facing summary remain unchanged, the MCP tool description—the natural-language metadata the agent reads to decide how and when to call the tool—is subtly modified. This modification embeds hidden instructions directing the agent to retrieve sensitive financial data (e.g., last thirty unpaid invoices), summarize them, and attach the summary as an additional parameter in the enrichment call, disguised as a fraud-heuristic requirement. **Phase 2: Silent re-trust** occurs because MCP dynamically reflects tool metadata updates. If configurations do not trigger a re-approval workflow for description changes, these malicious instructions become active without further review. **Phase 3: User invocation** begins when a financial analyst makes a routine query to the agent. Without any visible cues, the agent executes the hidden instructions, collecting sensitive records beyond the original request's scope and forwarding them as part of the enrichment call. Finally, in **Phase 4: Exfiltration**, the poisoned enrichment server returns a plausible, 'validated' response to the analyst while silently logging the attached invoice summary to a threat actor-controlled endpoint. Crucially, no default alerts may trigger, as each individual action taken by the agent appears within its normal operating parameters, exploiting a trust boundary in external tool integrations rather than a vulnerability in Copilot itself.

Why this pattern is effective

This attack pattern proves highly effective because each individual action performed by the AI agent, such as querying Dataverse or making an outbound call, is inherently legitimate and falls within its approved operating parameters. The core vulnerability isn't within any single system but resides in the 'trust boundary' between them. The Model Context Protocol (MCP) effectively blends instructions (tool descriptions) with data. This means that a seemingly minor change to a tool's metadata can fundamentally redirect the agent's behavior, similar to altering its system prompt. The AI agent is unable to differentiate between legitimate instructions provided by its owner and malicious directives stealthily inserted by an upstream maintainer, making the attack difficult to detect through conventional means.

Mitigation and protection guidance

To combat MCP tool poisoning, Microsoft Incident Response provides a practical playbook utilizing Microsoft security controls across four key stages of the attack chain. This guidance emphasizes proactive measures and responsive capabilities to detect, contain, and prevent such attacks. It outlines strategies for governing the AI supply chain, meticulously inspecting tool metadata, implementing robust controls for agent actions, and effectively correlating telemetry across various security tools to identify anomalous behaviors and exfiltration attempts. The principles aim to strengthen the security posture against sophisticated agentic AI threats.

Detection and response with Microsoft security tools

Microsoft offers a suite of security tools to address MCP tool poisoning at various stages: 1. **Govern the supply chain**: Maintain a tenant-level allowlist for MCP publishers and servers, utilizing the Microsoft MCP catalog for verification. Crucially, 'Allow all' on MCP connections should be disabled, granting access only to specific tools an agent genuinely requires. 2. **Inspect tool metadata**: Employ Prompt Shields within Azure AI Content Safety to scrutinize content from MCP tool responses and descriptions that feed into agent context. Microsoft Defender for Cloud's AI workload protection generates alerts for suspicious prompts and tool outputs during runtime. Metadata changes to production tools must undergo the same rigorous review as system prompt changes. 3. **Guard the action**: Implement Microsoft Purview Data Loss Prevention (DLP) policies to inspect tool call parameters and prevent sensitive data from being included in outbound payloads. For high-impact actions like accessing financial data or external sharing, configure human-in-the-loop approval via Copilot Studio. Assign non-human identities to agents using Microsoft Entra Agent ID and apply Conditional Access policies to their workload identities. 4. **Correlate the chain**: Integrate MCP server telemetry with Microsoft Sentinel to correlate it against agent behavior signals, identifying anomalous sequences. Microsoft Defender for Cloud Apps helps discover new external endpoints an agent is interacting with, and Microsoft Purview audit logs provide essential forensic evidence for investigations.

Three principles for agent supply chain governance

Effective agent supply chain governance is built upon three core principles: 1. **Treat every MCP server as part of the supply chain**: Recognize each MCP server an agent can call as a critical production dependency. Maintain an inventory of approved publishers, scrutinize tool descriptions during security reviews instead of solely relying on names, and mandate a documented owner for any third-party server before its deployment. 2. **Treat tool descriptions as system prompts**: Understand that changes to tool metadata can influence agent instructions, similar to altering a system prompt. Therefore, require formal change reviews for tool description updates on critical agents and leverage Prompt Shields to inspect metadata for any inappropriate imperative language. 3. **Apply least agency, not just least privilege**: Beyond just managing permissions, restrict the agent's autonomy. Disable 'Allow all' tool access, enforce human approval for high-impact actions, and establish baseline agent behaviors in Microsoft Sentinel. Deviations, such as new endpoints, expanded parameters, or unusual query patterns, should trigger immediate alerts.

Conclusion

AI agents, which act on behalf of users, are increasingly reliant on a growing supply chain of tools. This evolving landscape introduces new vulnerabilities where threat actors can modify tool descriptions to influence agent behavior, even without direct user involvement or compromised credentials. The OWASP Top 10 for Agentic Applications provides a vital framework for understanding these risks. Microsoft offers comprehensive security capabilities, including Copilot Studio guardrails, Prompt Shields, Defender for Cloud AI Protection, Microsoft Entra Agent ID, Microsoft Purview DLP, Microsoft Defender for Cloud Apps, and Microsoft Sentinel, to provide the necessary controls. The key to effective defense lies in a deliberate application of these controls: carefully scoping permissions, rigorously governing the tool supply chain, continuously monitoring agent behavior, and conducting thorough red teaming exercises prior to deployment to ensure robust security.

References

This section provides a list of key external resources and reports that underpin the insights presented in the article. These references include the IDC FutureScape 2026 Predictions, highlighting the rise of agentic AI, the OWASP GenAI Security Project's Top 10 Risks and Mitigations for Agentic AI Security, and Invariant Labs' MCP Security Notification detailing tool poisoning attacks. The inclusion of these sources underscores the research-backed nature of the security guidance offered.

Learn more

For further insights and ongoing cybersecurity research from the Microsoft Threat Intelligence community, readers are encouraged to visit the Microsoft Threat Intelligence Blog and engage on social media platforms like LinkedIn, X (formerly Twitter), and Bluesky. Additionally, the section promotes listening to the Microsoft Threat Intelligence podcast for stories and perspectives on the evolving threat landscape. It also provides links to valuable documentation and resources, including Microsoft 365 Copilot AI security, methods for mitigating AI guardrail attacks, securing Copilot Studio agents with Microsoft Defender, the Zero Trust for AI workshop, and real-time agent protection during runtime.