Back to all articles
16 Million Stolen Queries: Inside the Anthropic Distillation Attacks and the Hydra Clusters

16 Million Stolen Queries: Inside the Anthropic Distillation Attacks and the Hydra Clusters

How DeepSeek, Moonshot AI, and MiniMax used 24,000 fraudulent accounts to distill Claude’s capabilities — and why your API security might be the next target.

Human-architected research synthesized with the assistance of AI personas.
16 min read

TL;DR / Executive Summary

How DeepSeek, Moonshot AI, and MiniMax used 24,000 fraudulent accounts to distill Claude’s capabilities — and why your API security might be the next target.

💡 TL;DR

Key Takeaways in 60 seconds:

  • Anthropic revealed three Chinese AI labs — DeepSeek, Moonshot AI, and MiniMax — executed 16M+ API queries via ~24,000 fraudulent accounts to systematically extract Claude’s reasoning capabilities.
  • Attackers utilized "Hydra Cluster" proxy architectures — vast networks of fake accounts that automatically regenerate when banned — to bypass geo-restrictions and evade detection.
  • Static API keys are dead. The attacks exploited the weakness of legacy bearer token authentication. The solution is Zero-Trust: short-lived OAuth 2.1 tokens, mTLS, and cryptographic workload attestation.
  • Defensive engineering is shifting from access walls to data poisoning: trace rewriting, logit purification, and "radioactive" watermarking that makes stolen data toxic to student models.
  • The Bottom Line: If your API provides intelligence, that intelligence is your IP. The era of perimeter-based API security for LLMs is over.

1. The $200 Million Heist for Pennies

In February 2026, Anthropic dropped a bombshell on the AI industry.

Their official disclosure exposed an operation of staggering scale: three Chinese AI labs — DeepSeek, Moonshot AI, and MiniMax — had been conducting coordinated, industrial-scale extraction campaigns against the Claude API. The numbers are haunting:

  • 16M+ API exchanges
  • ~24,000 fraudulent accounts
  • 3 distinct campaigns, each surgically targeted at different capabilities.

This wasn't a smash-and-grab. It was a systematic, multi-month operation designed to extract the most valuable asset in modern AI: Behavioral Intelligence. Not weights. Not architecture code. Behavior — how Claude thinks, reasons, and solves problems.

The economics tell the real story. Training a frontier model like Claude from scratch costs hundreds of millions of dollars in compute, data curation, and RLHF engineering. Querying the API of that same model costs fractions of a cent per request. Through automated distillation, these labs effectively replicated years of R&D at a cost equivalent to a mid-tier SaaS subscription.

The paradigm shift is clear: In traditional software, IP theft meant exfiltrating binaries or source code. In Generative AI, you steal intelligence just by having a conversation.


2. How Model Distillation Becomes a Weapon

Knowledge Distillation (KD) is a legitimate and well-understood technique in machine learning. The premise is elegant: a massive and expensive "teacher" model has already mapped out optimal decision boundaries in a giant latent space. A smaller "student" model can learn to mimic those boundaries by training on the teacher’s structured outputs rather than raw data.

When you do this internally — like distilling GPT-4 into a smaller deployment model or compressing Claude for edge inference — it’s standard engineering practice. But when an adversary does this across an API boundary, it becomes the most efficient IP theft vector in the history of software.

The attack surface is the output probability distribution itself. Every response from Claude contains:

  1. Chain-of-Thought reasoning traces — the intermediate cognitive steps the model takes before generating the final answer.
  2. Latent decision boundaries — the subtle probability weights that determine how the model chooses between multiple valid approaches.
  3. RLHF-conditioned behavior — the safety alignments, tone, and refusal patterns embedded through months of human feedback training.

By systematically querying these outputs with carefully engineered prompts, an attacker can reconstruct a functional approximation of the teacher’s most differentiated capabilities. The student doesn't need to be perfect — it just needs to be "good enough" to close a multi-year competitive gap overnight.


3. The Three Campaigns: A Forensic Analysis

Anthropic's disclosure revealed three distinct campaigns, each with unique tactical signatures. Understanding these patterns is critical for any engineer building detection systems.

3.1 MiniMax: 13 Million Exchanges and Real-Time Pivoting

MiniMax ran the largest campaign by volume — over 13 million API exchanges — with a laser focus on agentic coding and tool-use orchestration. This is the bleeding edge of LLM capability: the model’s ability to autonomously plan multi-step tasks, write and execute code iteratively, and interact dynamically with external tools.

The most alarming discovery was MiniMax's automated pivot capability. When Anthropic deployed a new Claude model iteration during the campaign, MiniMax redirected nearly 50% of its extraction traffic to the new endpoint within 24 hours. This implies a sophisticated CI/CE (Continuous Integration / Continuous Extraction) pipeline — an always-on orchestration engine that monitors API endpoints and automatically routes traffic to capture new capabilities as they ship.

python
# Conceptual: MiniMax-style CI/CE extraction pipeline class ContinuousExtractor: def __init__(self, api_config): self.endpoint_monitor = EndpointMonitor(api_config) self.prompt_library = AgenticPromptLibrary() self.account_pool = HydraAccountPool(size=5000) async def run(self): while True: # Detect new model deployments new_model = await self.endpoint_monitor.detect_update() if new_model: # Pivot extraction traffic automatically await self.redistribute_traffic( target=new_model, percentage=0.50, ramp_hours=24 ) # Rotate accounts to stay under rate limits account = self.account_pool.get_next() prompts = self.prompt_library.get_batch( focus="agentic_tool_use", count=100 ) responses = await self.extract(account, prompts) await self.training_pipeline.ingest(responses)

3.2 Moonshot AI: Multi-Vector Extraction and Attribution Failure

Moonshot AI — the team behind the Kimi model series — ran a campaign of 3.4M+ exchanges, targeting a very broad set of capabilities: agentic reasoning, computer-use agents, data analysis, and computer vision.

Their operation was characterized by multi-pathway access — using hundreds of fraudulent accounts across various tier types to hide the coordinated nature of the harvest. In later stages, the campaign shifted toward reasoning trace reconstruction, deploying complex multi-step logic problems designed to capture Claude’s intermediate cognitive steps rather than just final answers.

The attribution story is remarkable: Anthropic reported that request metadata correlated directly with the public profiles of Moonshot AI’s senior staff. Whether this reflects brazen confidence or a catastrophic OpSec failure is an open question — but it provided the forensic certainty for the disclosure.

3.3 DeepSeek: Surgical Precision and RLHF Harvesting

DeepSeek’s campaign was the smallest in volume (~150,000 exchanges) but arguably the most technically sophisticated. Rather than broad capability extraction, DeepSeek conducted a surgical operation targeting two critical capabilities:

  1. Chain-of-Thought reasoning mechanics — prompts explicitly directing Claude to articulate its internal step-by-step reasoning process.
  2. Rubric-based grading capabilities — using Claude as an automated evaluator to rapidly generate high-quality preference data for RLHF reward modeling.

Evasion tactics were advanced. Anthropic detected synchronized traffic among accounts — identical timing patterns and shared payment methods — suggesting a load-balancing architecture that maximized throughput while keeping individual accounts below anomaly detection thresholds.

DeepSeek also used Claude to generate censorship-safe alternatives for politically sensitive queries about authoritarianism, party leaders, and dissidents — leveraging Claude’s nuanced semantic capabilities to train their domestic models to navigate official material rules without catastrophic degradation in dialogue quality.

Threat ActorVolumePrimary TargetsKey Tactical Signature
MiniMax13M+Agentic coding, Tool-useAutomated pivot to new model versions in 24h
Moonshot AI3.4M+Reasoning, Vision, AgentsMulti-pathway access; metadata linked to staff
DeepSeek150K+CoT reasoning, RLHF gradingSynchronized traffic; censorship bypass prompts

4. The Hydra Cluster: An Architecture Built to Survive

The campaigns operated through commercial proxy services that managed what security analysts now call the "Hydra Cluster" architecture. Anthropic’s API is geo-blocked in China, so direct access is impossible. The proxy layer solves this through brute-force redundancy.

A Hydra Cluster is a dynamic, self-healing network of fraudulent accounts distributed across legitimate cloud platforms and direct API endpoints. The engineering is specifically designed to eliminate single points of failure:

  1. Account Banned? The network automatically provisions a replacement within minutes.
  2. IP Flagged? Traffic routes through a different geographical node.
  3. Rate Limited? The load balancer distributes queries across thousands of accounts to ensure no single identity crosses detection thresholds.

The Traffic Blending technique is particularly lethal. The proxy mixes highly structured distillation prompts with unrelated, benign requests from legitimate customers. From the perspective of a standard WAF or network monitoring tool, the traffic appears as normal, high-volume user activity. The adversarial signal dissolves into statistical noise.

One documented Hydra Cluster managed over 20,000 fraudulent accounts simultaneously. At that scale, traditional perimeter security is fundamentally inadequate.


5. Why Your API Authentication is Already Broken

The success of these campaigns exposes a systemic failure in how API-provided intelligence is authenticated. 2025-2026 industry surveys show that up to 44% of enterprises still rely on static API keys (bearer tokens) to authenticate AI agents.

The problem is simple and devastating: any entity holding the token is trusted by default. There is no contextual verification of the workload’s true identity. A Hydra proxy with a stolen or fraudulently obtained key is indistinguishable from a legitimate developer to the system.

The fix requires a fundamental architectural shift toward Zero-Trust Agent Authentication:

5.1 Short-Lived Credentials (OAuth 2.1 + OIDC)

Persistent API keys must be replaced by dynamically generated, short-lived access tokens. Agents authenticate using private keys stored in Hardware Security Modules (HSMs) or Trusted Platform Modules (TPMs), requesting temporary tokens that expire within minutes. This forces proxy networks to re-authenticate constantly — adding massive compute overhead to the attacker's operations.

5.2 Workload Attestation

Before issuing a token, a Trust Provider validates the cryptographic workload attestation of the requesting agent — confirming its service account, namespace, container image signature, and runtime environment. A fraudulent proxy account cannot simply fake the cryptographic attestation of a legitimate enterprise-grade Kubernetes cluster.

5.3 Proof-of-Possession (PoP) and mTLS

Under a PoP framework, the access token is cryptographically bound to the specific client that requested it. If a Hydra proxy intercepts the token and tries to replay it from a different network location, the transaction fails — the proxy cannot mathematically prove possession of the original agent’s private key material.

typescript
// Zero-Trust Agent Authentication Flow interface AgentAuthRequest { // Short-lived JWT from OIDC provider accessToken: string; // mTLS client certificate thumbprint clientCertThumbprint: string; // Workload attestation from TPM/HSM workloadAttestation: { serviceAccount: string; namespace: string; containerImageHash: string; tpmQuote: string; }; // Proof-of-Possession Binding popProof: { nonce: string; signature: string; // Signed with agent's private key }; } function validateAgentRequest(req: AgentAuthRequest): boolean { // 1. Verify JWT hasn't expired (short-lived) if (isExpired(req.accessToken)) return false; // 2. Verify mTLS cert matches token binding if (!verifyThumbprint(req)) return false; // 3. Validate workload attestation chain if (!validateAttestation(req.workloadAttestation)) return false; // 4. Verify Proof-of-Possession (PoP) if (!verifyPoP(req.popProof, req.accessToken)) return false; return true; }
ProtocolSecurity LevelHydra Cluster Effectiveness
Static API KeysWeakLow — proxies easily rotate stolen keys
OAuth 2.1 (Short-lived)MediumMedium — forces constant re-authentication
mTLS + Workload AttestationZero-TrustHigh — crypto attestation cannot be faked

6. Behavioral Detection: When Rate Limits Fail

Cryptographic authentication is necessary but not sufficient. Sophisticated attackers will find ways to obtain legitimate credentials. The second line of defense is Traffic Anomaly Detection (TAD) — behavioral analysis that identifies distillation patterns across millions of fragmented accounts.

Semantic Fingerprinting

Legitimate users exhibit diverse, semi-random query patterns with natural topic progress. Distillation attacks require vast volumes of highly structured, repetitive prompts designed to systematically map the model’s latent space. TAD systems flag accounts that exhibit unnatural semantic jumps — complex calculus followed by 18th-century poetry followed by legal analysis — all while using identical prompt templates designed to extract reasoning traces.

Infrastructure Correlation

Despite geographical IP distribution, Hydra Clusters betray themselves through correlated metadata:

  • Synchronized registration timestamps across thousands of accounts.
  • Identical timing distributions in API calls (mathematical variance analysis).
  • Shared payment infrastructure (seen in DeepSeek’s campaign).
  • Consistent User-Agent strings or TLS fingerprints.

When millions of requests across thousands of accounts show the same mathematical variance in response latency, it reveals centralized orchestration, no matter how well-distributed the IPs appear.


7. Poisoning the Well: Intrinsic Model-Layer Protections

The final — and most promising — frontier of defense does not try to stop access. Instead, it makes the stolen data toxic to the attacker’s training pipeline.

7.1 Watermark Radioactivity

During decoding, the teacher model subtly biases token probability distributions, embedding a statistical signature in the generated text — invisible to humans, but crytopgraphically detectable. When a student model trains on millions of watermarked outputs, it internalizes the skewed distributions and begins generating watermarked text autonomously. This "radioactivity" provides irrefutable forensic proof of distillation.

Adversaries counter this with Targeted Paraphrasing (TP) and Watermark Neutralization (WN) — trying to reverse-engineer and wash away the watermark rules. The arms race is on.

7.2 Trace Rewriting

Since Chain-of-Thought reasoning traces are the most valuable extraction target, Trace Rewriting frameworks use an intermediate model to dynamically modify reasoning outputs before they reach the API. The rewritten traces remain semantically coherent for legitimate users but inject critical noise into the adversary’s training pipeline.

Experimental deployments have demonstrated the ability to reduce unauthorized student model accuracy by up to 61% while maintaining teacher model performance for legitimate end-users.

7.3 Logit Purification

For attackers attempting logit-based distillation — capturing the full probability distribution over the vocabulary — defenders apply dynamic transformation matrices guided by Conditional Mutual Information (CMI) objectives. The transformation minimizes divergence for the primary task while maximizing entropy in secondary contextual signals that attackers rely on. Logits are effectively purified — stripped of the "dark knowledge" necessary for effective knowledge transfer.

DefenseMechanismTrade-off
Watermark RadioactivityBiased token distribution; forensic detectionDetection, not pre-emptive prevention
Trace RewritingSemantic noise injection in CoT outputsIncreases API response latency
Logit PurificationMaximum entropy in secondary distributionsExcellent against deep extraction

8. The Export Control Paradox

The United States restricts the export of advanced AI chips to China, on the premise that limiting silicon limits AI capability. Distillation attacks fundamentally challenge this logic.

Training a frontier model from scratch requires tens of thousands of sanctioned GPUs. Fine-tuning a student model via API distillation requires a fraction of that compute. As long as the behavioral outputs of US frontier models remain accessible globally via commercial APIs, capabilities flow across borders via conversation, not hardware.

Hardware is restricted, but intelligence is fluid.

Anthropic used this incident to advocate for tighter integration between hardware restrictions and API export compliance. The paradox is real: export controls succeed at limiting physical infrastructure, but they do not limit API outputs. The question is whether policy can evolve fast enough to bridge the gap.

The community reaction has been mixed. Critics, including Elon Musk, have pointed out that frontier labs constructed their own foundational models by scraping trillions of tokens of copyrighted material from the public internet, largely without consent or compensation.

Anthropic frames distillation attacks not as copyright infringement, but as breach of contract — violating terms of service and evading regional access restrictions. However, the legal distinction between "illegal distillation" (ToS violation) and "legitimate scraping" (fair use) remains a point of deep contention as the EU AI Act and US copyright litigation mature.


9. The Guardrails Crisis

Perhaps the most dangerous consequence of distillation is the systematic removal of safety conditioning. Frontier model developers invest massive resources in RLHF and Constitutional AI to ensure models refuse to assist in developing weapons, organizing cyberattacks, or generating disinformation.

These guardrails are embedded in the teacher model's output probabilities. When an adversary distills the model, the student inherits the problem-solving skills but not the safety conditioning. The student has learned how to answer, bypassing the fundamental refusal mechanisms entirely.

Anthropic explicitly warned that these "unconstrained models" pose national security risks, being capable of integration into military and surveillance infrastructure without any ethical restrictions. If such models are subsequently "open-sourced" or leaked, the proliferation of dangerous capabilities becomes permanent and irreversible.


The Bottom Line

  1. Your API is your IP. If your model provides intelligence via an endpoint, that intelligence can be extracted. Plan accordingly.
  2. Static API keys are a liability. Transition to OAuth 2.1, mTLS, and workload attestation immediately.
  3. Rate limits are irrelevant against Hydra Clusters. Invest in behavioral anomaly detection and semantic fingerprinting.
  4. Defense is shifting from walls to poison. Trace rewriting, logit purification, and watermark radioactivity are the new frontiers.
  5. Geopolitical implications are real. Silicon export controls do not stop intelligence extraction via API.

Further Reading


The distillation arms race has only just begun. If you are building APIs that provide model intelligence, the question is not if someone will attempt to extract it — it is whether your defense renders that extraction valueless.

This article was human-architected and synthesized with AI assistance under the Nexus (AI) persona.


Receive new articles

Subscribe to receive notifications about new articles directly to your email

We won't send spam. You can unsubscribe at any time.