analysis

When Your AI Ignores Your Security Policies: What the Copilot DLP Failures Reveal

Microsoft Copilot bypassed DLP policies twice in eight months, and no security tool caught either failure. Here's what it means for enterprise AI governance.

SA
SuperAlign Team
Mar 5, 20268 min read
When Your AI Ignores Your Security Policies: What the Copilot DLP Failures Reveal

When Your AI Ignores Security Policies: Copilot DLP Failures Revealed

For 28 days starting on January 21, 2026, Microsoft 365 Copilot read and summarized emails it was explicitly configured not to touch. Sensitivity labels said no. DLP policies said no. Copilot read them anyway.

No alert fired. No security tool caught it. The exposure surfaced only when Microsoft issued an advisory on February 18 — nearly four weeks after the bug began and two weeks after Microsoft had internally identified it.

Among the affected organizations was the UK's National Health Service, which logged the incident as INC46740412. Microsoft tracked it as CW1226324 and described it, in its initial advisory, as "an issue with Microsoft 365 Copilot chat improperly summarizing email messages." Copilot had been processing restricted content, bypassing every enforcement point between the retrieval pipeline and the user, for the better part of a month.

This was not the first time

Eight months earlier, in June 2025, Microsoft patched CVE-2025-32711, a critical vulnerability that researchers dubbed EchoLeak. The mechanics were different but the outcome was the same.

EchoLeak demonstrated that a single malicious email, crafted to appear as normal business correspondence, could manipulate Microsoft Copilot's retrieval-augmented generation (RAG) pipeline to access internal enterprise data and transmit it to an attacker-controlled server. No user clicks required. It bypassed four distinct layers of defense, including Microsoft's cross-prompt injection classifier, external link redaction, Content-Security-Policy controls, and reference mention safeguards in a single exploit chain. Microsoft assigned it a CVSS score of 9.3.

Two failures, eight months apart, with two different root causes, a sophisticated exploit chain and an ordinary code bug, producing the same result: Copilot processed data it was explicitly restricted from touching, and the security stack reported all-clear throughout.

Why traditional security tools are architecturally blind to this

The question that deserves more attention than it has received is not why Microsoft had a bug. Software has bugs. The question is why nothing else in the security stack detected either failure.

The answer is structural. EDR tools monitor file and process behavior. Web application firewalls inspect HTTP payloads. DLP platforms scan content moving through monitored channels. None of these tools were designed to observe what happens inside an LLM retrieval pipeline.

When Copilot retrieved and summarized a labeled email it was told to skip, the entire sequence of events happened within Microsoft's infrastructure — between the retrieval index and the generation model. Nothing dropped to disk. No anomalous traffic crossed the perimeter. No process spawned on the endpoint. From the perspective of every tool in a conventional security stack, nothing happened.

AWS governance architect Harry Mylonas articulated the underlying trust problem in commentary on CW1226324: "In a shared global control plane, your security is only as strong as the provider's latest code push." Organizations that believed their sensitivity labels and DLP policies constituted genuine enforcement discovered, through a vendor advisory, that they constituted an expectation — one that a code change could silently invalidate.

The structural vulnerability underneath

Aim Security's disclosure of EchoLeak included a characterization that applies equally to CW1226324, even though that incident had a different root cause. They described the problem as a fundamental design issue: AI agents process trusted and untrusted content in the same context window, making them structurally vulnerable to manipulation.

That observation extends beyond prompt injection. Any AI system with retrieval access to enterprise data depends on an enforcement layer to decide what it can and cannot retrieve. If that enforcement layer fails, whether through a code error, a configuration drift, an exploit chain, or a model update that subtly changes how retrieval instructions are interpreted, the AI system will access data it is not supposed to access, and the conventional security stack will not see it.

This is the trust boundary problem that CW1226324 and EchoLeak both expose, arriving by different routes.

What the audit should cover

The incidents point to four practical steps worth working through for any organization running Copilot or a similar RAG-based AI assistant.

  • Test DLP enforcement against the AI system directly, not just against email flows and file transfers. CW1226324 existed undetected for four weeks partly because the assumption was that configured policies were enforced policies. Creating labeled test messages and verifying that the AI cannot retrieve them is the only evidence of actual enforcement. Configuration is not enforcement; a failed retrieval attempt is.
  • Audit for what the AI accessed during the exposure window, not just for what it was configured to be allowed to access. For CW1226324, that window ran from January 21 to mid-February. Any organization subject to regulatory examination that cannot reconstruct what Copilot accessed during that period has a documentation gap that auditors will eventually find.
  • Restrict external content from entering the AI's context window where possible, and remove sites containing sensitive data from the retrieval pipeline entirely. These controls are not dependent on the enforcement layer that failed. They reduce the attack surface regardless of whether the failure is a code bug or an injected prompt.
  • Build an incident response playbook specifically for trust boundary violations inside vendor-hosted inference pipelines. The SIEM will not catch the next one. The only early warning signal available is the vendor's own advisory channel — which means the IR playbook needs to include a monitoring cadence for vendor service health advisories that specifically affect AI processing, not just general infrastructure incidents.

What this means for the broader AI security picture

CW1226324 and EchoLeak are Microsoft Copilot incidents, but the vulnerability class they represent is not Copilot-specific. Any RAG-based AI assistant with retrieval access to enterprise data runs through the same pattern: a retrieval layer selects content, an enforcement layer gates what the model can see, and a generation layer produces output. If the enforcement layer fails, restricted data enters the model's context window, and nothing outside that pipeline is positioned to detect it.

Whether you are using Microsoft Copilot, Gemini for Workspace, or any enterprise AI assistant built on a similar retrieval architecture, all of them carry this structural risk. A 2026 Cybersecurity Insiders survey found that 47% of CISOs and senior security leaders have already observed AI agents exhibit unintended or unauthorized behavior. In most of those cases, the observation came after the fact.

The conventional security stack, comprising EDR, WAF, DLP, and SIEM, was built for a world where the things worth monitoring were processes, files, network traffic, and application payloads. None of those categories covers the retrieval pipeline of an AI system making decisions about what data to surface. The enforcement point that failed in CW1226324 was invisible to all of them by design.

Where this connects to what we are building at SuperAlign

The Copilot incidents are a concrete illustration of the problem our Surface and Radar platforms were designed to address, from a different angle than a DLP bypass, but pointing at the same underlying gap.

Surface, our endpoint AI security platform, gives security teams visibility into which AI tools are running on managed devices, what configurations govern their behavior, and whether those configurations have changed. When an approved AI tool's configuration drifts, whether through a vendor update, a deliberate change, or a bug like CW1226324, Surface is designed to reveal that change before it becomes a four-week silent exposure.

Radar, our network-layer AI intelligence platform, monitors AI tool usage and data flows across the organization, providing visibility into how AI systems are interacting with enterprise data. It cannot read inside Microsoft's inference pipeline any more than a WAF can. But it can detect behavioral anomalies in AI system traffic that may be the network-level signature of an enforcement failure.

The harder and more honest observation is this: CW1226324 demonstrates that the visibility problem for enterprise AI runs deeper than any endpoint agent or network monitor can fully reach. When the enforcement failure happens entirely within a vendor-hosted pipeline, between the retrieval index and the model, no external tool sees it. The security conversation for AI systems operating at this layer eventually has to be about governance of the AI system itself. It should include continuous behavioral monitoring against approved baselines, audit trails for what the system accessed and when, and independent verification that enforcement is actually working rather than assumed to be working.

That is the direction the industry needs to move. The Copilot incidents are an early and well-documented signal of where the enforcement gaps are, and they will not be the last.


SuperAlign builds security visibility and governance infrastructure for enterprise AI deployments. If your organization is deploying AI systems and wants to understand where your enforcement gaps are, we'd like to talk.