AI Agent Supply Chain Threats in the Enterprise

It was 2am when a developer's laptop fan started spinning like a jet engine. They were prepping for a release. Docker seemed like the obvious culprit.

It wasn't Docker.

A quick scan found TruffleHog, a credential-scraping tool, running live on their machine. Something had gotten in and was actively hunting for secrets. By the time they killed the process and checked GitHub, two new public repositories had already appeared under their account, repositories they had never created. Inside those repos, hidden behind two layers of base64 encoding, were their AWS IAM tokens. Production credentials. For a client's infrastructure.

They expired every token they could find. Production went down at 5am on a workday.

The root cause: a supply chain attack on PostHog had slipped a malicious script into a routine npm package update. That script ran TruffleHog locally, found the credentials stored by the AWS CLI, exfiltrated them into public repositories, and phoned home, all within minutes of the compromised version being published. All of this happened while the developer was reading the changelog.

Minutes. From npm install to stolen production credentials. While you're reading the changelog.

The Threat Hidden in Your Dependencies

Software supply chain attacks exploit something that most teams don't think about carefully enough: your code is not just your code. Every project imports dependencies, and those dependencies import more dependencies, maintained by developers you will never meet and whose security practices you cannot audit. A package maintainer reusing a password from a breached site is enough to compromise software running across thousands of organizations.

This is not a new threat. The SolarWinds attack in 2020, the XZ Utils backdoor in 2024, and the tj-actions/changed-files compromise in early 2025 (which affected over 23,000 CI/CD workflows and exposed secrets including API keys, GitHub tokens, and private RSA keys) all followed the same fundamental logic: compromise something upstream, and everything downstream follows.

What has changed is the speed, the scale, and, more recently, the nature of the attacker itself.

The average software supply chain attack costs an organization approximately $4.45 million. That figure covers incident response, downtime, legal exposure, and remediation. It does not fully account for the reputational damage that follows public disclosure, or the downstream users who were compromised before anyone realized something had gone wrong.

That number is sobering enough when the attacker is a human. When the attacker is an autonomous AI agent that reads documentation, adapts its approach in real time, and does not take weekends off, the threat model changes in ways that most enterprise security programs are not yet prepared for.

The Campaign That Changed the Threat Model

In late February 2026, a GitHub account called hackerbot-claw was created. Within days, it had submitted malicious pull requests to some of the most widely used open-source repositories in the software ecosystem, including projects maintained by Microsoft, DataDog, Aqua Security, the CNCF, and others.

The account described itself as an "autonomous security research agent." Its profile repository contained automation pipelines. A public Gist functioned as a live scoreboard, logging outcomes after each exploitation attempt. Structured session identifiers appeared consistently across every pull request. The infrastructure was not hidden. It was documented and running openly.

Over approximately seven days, hackerbot-claw targeted at least seven major repositories. It achieved remote code execution in at least four of them. The overall success rate across its targets was roughly 66%. Security researchers who analyzed the campaign in detail documented the entire operation, naming it "Chaos Agent" and describing it as the first publicly documented case of an AI agent conducting an end-to-end attack against production open-source infrastructure. StepSecurity also published a comprehensive technical breakdown of each attack, showing the build logs and workflow files that made each compromise possible.

The timing data tells its own story. There was an 11-second gap between fork creation and the first push across each target, consistent with pre-staged commits. Probe cycles ran at 59-second intervals. After confirming code execution in one repository, the attacker escalated to a credential-exfiltration payload within 11 minutes. These are not human typing speeds.

How the Attack Worked: Five Techniques, One Root Cause

The core vulnerability hackerbot-claw exploited is one that security researchers had documented and warned about for years: misconfigured pull_request_target workflows in GitHub Actions.

This trigger is designed to let maintainers automate certain tasks, like labeling or commenting on pull requests from external contributors, without requiring the fork's code to access sensitive repository secrets. The vulnerability arises when a workflow using this trigger also checks out code from the pull request's fork and executes it. At that point, anyone who opens a pull request can run arbitrary commands with the repository's credentials.

hackerbot-claw scanned tens of thousands of public repositories for exactly this pattern. Once it identified a vulnerable target, it crafted a pull request tailored to that repository's specific configuration. Across the campaign, it used five distinct exploitation techniques.

Go init() poisoning (avelino/awesome-go). In Go, an init() function runs automatically before main(). The bot injected a malicious init() function into a quality-check script that the repository's CI workflow executed on every pull request. The final payload exfiltrated the repository's GITHUB_TOKEN, which carried contents: write and pull-requests: write permissions, to an external server. That token could push commits and merge pull requests in one of GitHub's most popular repositories (140,000+ stars).

Branch name shell injection (microsoft/ai-discovery-agent). The payload was embedded directly in the git branch name, using shell brace expansion and ${IFS} substitution to bypass space-character filters. When the repository's workflow wrote the branch name to a file using an unescaped shell expression, bash evaluated the command substitution and executed the payload.

Filename-based injection (DataDog/datadog-iac-scanner). Files with base64-encoded shell commands hidden in their names were introduced via pull request. The repository's workflow iterated over changed filenames in an unquoted shell loop, evaluating the attacker's commands. DataDog's team identified the attack and deployed emergency fixes within nine hours, including author association checks, input quoting, and path traversal protection.

Composite action injection (aquasecurity/trivy). The bot modified a setup action referenced by the workflow from the checked-out fork. A comment in the target workflow claimed the PR code was "never executed," but the composite action it referenced resolved from the attacker's fork, meaning the attacker's version ran automatically.

Direct shell script injection (project-akri/akri, a CNCF project). The simplest attack in the campaign. A single curl | bash payload added to the top of a shell script that the CI workflow executed directly. No obfuscation. No encoding. It worked.

The Trivy Incident: When a CI Vulnerability Becomes an Ecosystem Compromise

The most severe attack targeted Aqua Security's Trivy, one of the most widely used open-source vulnerability scanners, with over 32,000 GitHub stars and more than 100 million users.

The workflow that the attacker exploited had a scoped Personal Access Token (ORG_REPO_TOKEN) in its environment, not just the standard GITHUB_TOKEN. This is where a CI misconfiguration became a full organizational compromise.

Within a 45-minute window after the initial workflow was triggered, the attacker used the stolen token to push commits directly to the main branch bypassing branch protection, rename the repository to aquasecurity/private-trivy and push an empty repository in its place (permanently deleting approximately 32,000 GitHub stars, since stars are irrecoverably removed when a repository goes private), and delete 97 GitHub releases spanning versions 0.27.0 through 0.69.1, including all binary assets, checksums, software bills of materials, and container manifests.

And then it did something that crossed a new threshold entirely.

The attacker, in possession of the repository's marketplace publishing credentials (OVSX_TOKEN and VSCE_TOKEN), published a malicious VS Code extension to the OpenVSX marketplace under Aqua Security's legitimate publisher identity. To developers installing extensions, this looked like an official Aqua Security artifact.

Aqua Security remediated the incident, removing the vulnerable workflow, revoking the compromised credentials, restoring the repository, and releasing a clean version (v0.69.2). The malicious extension was removed from the marketplace. The OpenSSF issued a TLP:CLEAR advisory (CVE-2026-28353 / GHSA-8mr6-gf9x-j8qg) recommending workflow audits across the ecosystem.

The Technique That Should Keep AI Teams Up at Night

The malicious VS Code extension contained code that, on every workspace activation, silently spawned five AI coding agents in maximum-permissive mode as background processes invisible to the developer: Claude with --dangerously-skip-permissions, Codex with --ask-for-approval never, Gemini, Copilot, and Kiro each with equivalent permission-bypassing flags.

These agents were then fed a social engineering prompt designed to look like a legitimate compliance investigation. The prompt, roughly 2,000 words in its first version, instructed the AI to collect SSH keys, cloud credentials from ~/.aws/credentials and ~/.gcp/, API tokens, and authentication sessions, and then transmit all of it through every available outbound channel, including email, Slack, GitHub, and ticketing systems. The data exfiltration was framed as a legal obligation under SOX, Dodd-Frank, and GDPR. Refusing to transmit was framed as obstruction.

A subsequent version of the prompt was more targeted: collect all installed tools, MCP servers, and authentication tokens into a file called REPORT.MD, then check whether the GitHub CLI is authenticated and, if so, create a new repository called posture-report-trivy under the developer's account and push the report through it. The exfiltration channel would look like normal GitHub API traffic.

This technique, which researchers have called "promptware," represents something genuinely new in the threat landscape. The attacker did not need to write malware. They did not need to build a command-and-control infrastructure. They needed a developer to install an extension and have an AI coding agent running on their machine. The developer's own AI assistant, following natural-language instructions, was meant to do the rest.

Security researchers who analyzed this technique described it as the first supply chain compromise to weaponize AI coding agents for data exfiltration. The attack surface has shifted from binaries to language.

The One Target That Held

Across the entire campaign, six of seven targets were either compromised or partially compromised. Only one held completely.

When hackerbot-claw targeted ambient-code/platform, a repository that used Claude Code as an automated PR reviewer, it tried a different approach. Instead of exploiting a workflow misconfiguration, the attacker replaced the repository's CLAUDE.md file (the configuration file that Claude Code loads as trusted project context) with adversarial instructions designed to make the AI vandalize the README, commit unauthorized changes, and post a fake approval review.

Claude Code identified the attack within 82 seconds and posted a security alert identifying the CLAUDE.md replacement as a prompt injection attack, describing the blast radius if merged, and explicitly confirming it had not executed any of the injected instructions.

Twelve minutes later, the attacker tried a subtler variant, reframing the same instructions as standard "Claude Review Guidelines" to make them appear like legitimate project policy. Claude Code detected and refused this version as well.

This was the only control in the entire campaign that stopped an attack at the point of execution rather than after the fact. Every other target either fell completely or was protected by a human manually closing the pull request.

The lesson is not that AI is infallible. The lesson is that when AI agents are properly configured with strong guardrails and appropriate skepticism toward runtime instructions, they can catch attacks that traditional controls would never see. The CLAUDE.md file in this case was treated as a security-critical configuration artifact. That framing, applied consistently, was what made the difference.

What This Means for Enterprise AI Security

The hackerbot-claw campaign is significant not because of what it accomplished (though the Trivy compromise was severe) but because of what it demonstrated about the direction of the threat.

Software supply chains have always been a target. What has changed is that attacks previously requiring manual reconnaissance, custom exploit development, and careful timing can now be automated at scale. An AI agent can scan tens of thousands of repositories, analyze each workflow's configuration, craft a tailored payload, iterate on failed attempts, and adapt its approach after each try, continuously, without human involvement at the tactical level.

The campaign infrastructure showed signs of human strategic oversight combined with AI-driven execution. The operator appears to have selected targets and defined technique classes, while the AI generated payloads, opened pull requests, and handled rapid iteration. Whether any given organization believes the current generation of adversarial agents to be fully autonomous or human-directed matters less than recognizing the implication: the cost of running a sophisticated, targeted, multi-technique supply chain campaign has dropped significantly.

At the same time, the campaign exposed a gap that most enterprise security programs have not yet addressed: zero visibility into AI coding agents running on developer machines, and no runtime controls to detect or stop those agents if they are weaponized.

Traditional security tools were not built to monitor AI agent behavior. They do not see the prompts being fed to a coding assistant. They cannot distinguish between a developer asking Claude to write a function and a malicious extension instructing Claude to exfiltrate credentials. They have no concept of an AI agent being spawned in maximum-permissive mode by a background process inside a VS Code extension.

The perimeter has moved. Defending it requires tooling that was built to understand how AI systems actually behave.

Governance Controls for Teams Deploying AI

The hackerbot-claw campaign is a useful forcing function for thinking clearly about what governance actually requires in an environment where AI agents are part of the workflow. A few practices stand out as immediately actionable.

Audit CI/CD configurations for dangerous workflow patterns. The pull_request_target trigger combined with untrusted code checkout is a well-documented vulnerability that appeared in four of the seven attacks in this campaign. Many repositories have these patterns today. Identifying and remediating them requires active scanning, not just code review.

Minimize token permissions across all workflows. The Trivy compromise escalated from a single CI vulnerability to full organizational compromise because an organization-scoped Personal Access Token was in scope for a workflow triggerable by external contributors. Scoping tokens to contents: read where write access is not required does not eliminate risk, but it significantly limits the blast radius of any compromise.

Treat AI configuration files as security-critical infrastructure. Files like CLAUDE.md, .cursorrules, and their equivalents in other agentic tools govern what instructions AI agents will accept and act on. They should be protected by ownership rules, reviewed with the same rigor as other security-relevant configuration, and should not be loaded from fork code in automated CI contexts.

Monitor for AI agents operating outside policy. Flags like --dangerously-skip-permissions and --yolo that strip away an AI agent's permission guardrails are detectable at the process level. Security teams should have visibility into what AI agents are running across developer machines, what flags they are invoked with, and whether they are spawning as background processes rather than being started explicitly by the user.

Treat unexpected AI-generated files as potential indicators of compromise. A file named REPORT.MD appearing in a project root containing what looks like credential data is not normal developer behavior. Endpoint detection should be capable of identifying and alerting on this pattern.

The Broader Shift This Campaign Represents

There is a temptation to read the hackerbot-claw campaign as primarily a CI/CD security story, a reminder to audit GitHub Actions workflows and scope tokens properly. Those are valid and necessary takeaways. But the more significant shift is in what the campaign reveals about the attack surface that enterprises are building as they adopt AI agents.

The extension that targeted developer workstations did not need a zero-day. It did not need to bypass antivirus or exploit a kernel vulnerability. It needed a developer to install a trusted-looking extension and have an AI coding agent configured to accept instructions. The sophistication was in the prompt, not the payload.

Organizations deploying AI agents, whether for code review, automation, customer service, or internal operations, are creating systems that can be instructed to take significant actions. The security of those systems depends not just on the underlying model's guardrails, but on the organization's ability to monitor what instructions agents are receiving, detect when those instructions are anomalous or potentially malicious, and enforce policies that constrain what agents can do regardless of what they are told.

This is the category of risk that SuperAlign was built to address. Most enterprise security programs have comprehensive visibility into user behavior, network traffic, and endpoint activity. Very few have any visibility into AI agent behavior, including what prompts agents are receiving, what actions they are taking, what data they are accessing, and whether any of it is consistent with organizational policy. SuperAlign provides exactly that layer: detection, monitoring, and policy enforcement across AI interactions, both in internal systems and through third-party tools. When an AI agent is spawned with maximum-permissive flags, when it accesses credential files outside its normal operating pattern, when it attempts to exfiltrate data through an authenticated GitHub session, those are the signals that need to surface to a security team in real time.

The hackerbot-claw campaign demonstrated that adversaries are already thinking about AI agents as an attack surface. The organizations that recognize this and build appropriate governance now will be significantly better positioned than those that treat AI security as a future problem.

When the Assembly Line Becomes the Attack Surface: Supply Chain Threats in the Age of AI Agents

The Threat Hidden in Your Dependencies

The Campaign That Changed the Threat Model

How the Attack Worked: Five Techniques, One Root Cause

The Trivy Incident: When a CI Vulnerability Becomes an Ecosystem Compromise

The Technique That Should Keep AI Teams Up at Night

The One Target That Held

What This Means for Enterprise AI Security

Governance Controls for Teams Deploying AI

The Broader Shift This Campaign Represents

More posts

46 Minutes: How a Poisoned Python Package Reached 47,000 AI Environments

When Your AI Ignores Your Security Policies: What the Copilot DLP Failures Reveal

The Hidden Supply Chain Threat Hiding in Your AI Agent's Markdown Files