AI Tools
10 min read

The Most Powerful Security AI Ever Built Is Protected by Email Verification

OpenAI opened GPT-5.4-Cyber to thousands of vetted defenders this week. Anthropic won't release Mythos publicly at all. Both models can find zero-days at scale. Both access controls have the same hole.

On April 14, OpenAI made GPT-5.4-Cyber available to "thousands of authenticated individual defenders and hundreds of teams responsible for securing critical software." Seven days earlier, Anthropic announced it would not release Mythos Preview to the public under any circumstances. Two of the most capable AI companies on Earth each looked at exactly the same problem — they had built something genuinely dangerous, and now they had to decide what to do with it — and came to opposite conclusions.

OpenAI said: give it to the defenders, as many as possible, as fast as responsible deployment allows.

Anthropic said: give it to nobody.

Both decisions are coherent. Both have the same vulnerability.

What OpenAI Actually Released

GPT-5.4-Cyber is a security-optimized variant of OpenAI's latest flagship model, with a lowered refusal boundary for legitimate security work. The model will discuss offensive techniques, explain exploitation mechanics, and assist with research tasks that the standard GPT-5.4 would decline. The headline capability is binary reverse engineering: analyzing compiled software for malware indicators, vulnerabilities, and security weaknesses without access to source code. Done manually, that's a workflow that takes senior security engineers days.

Access flows through OpenAI's Trusted Access for Cyber program, which the company calls TAC. Individual users verify identity at chatgpt.com/cyber. Enterprise teams apply through an OpenAI rep. OpenAI describes the rollout as deliberate: "scale cyber defense in lockstep — broadening access for legitimate defenders while continuing to strengthen safeguards." In parallel, OpenAI's Codex Security product — drawing on the same model lineage — has, per the company's own figures, contributed to fixing more than 3,000 critical and high-severity vulnerabilities.

The underlying argument is reasonable: security teams are drowning in alert fatigue and understaffed everywhere. If you can put a model in front of a defender that reads compiled binaries and flags CVE-class bugs in hours instead of weeks, the aggregate benefit to the defenders probably outweighs the marginal risk from a small number of misusers who get through identity verification. More defenders with better tools, more vulnerabilities found by the right side first.

That's OpenAI's bet.

What Anthropic Did Instead

Anthropic built something even more capable and then declined to release it.

If you read Tuesday's post here on Mythos, you know the technical picture already. The short version for anyone who missed it: Mythos is a general-purpose frontier model that turned out to be strikingly good at computer security — not because Anthropic trained it for that purpose, but because it reasons well about complex systems. When Anthropic pointed it at software, it found thousands of zero-days, including CVE-2026-4747, a 17-year-old remote code execution flaw in FreeBSD's NFS stack sitting in kgssapi.ko, the kernel module handling RPCSEC_GSS authentication. It found a 27-year-old flaw in OpenBSD. It didn't just find these bugs — it wrote working exploits, including a four-vulnerability JIT heap spray that broke both a renderer sandbox and an OS sandbox in a major browser.

Instead of releasing this publicly, Anthropic launched Project Glasswing: forty partner organizations — Amazon, Apple, Broadcom, Cisco, CrowdStrike, the Linux Foundation, Microsoft, Palo Alto Networks, and others — who get access to Mythos Preview for defensive security work. Partners scan their own and open-source software, then share findings with the broader industry. Anthropic has said the goal is to develop cybersecurity safeguards that will eventually ship with a future Claude model, enabling broader distribution with guardrails that don't exist yet.

The reasoning is different from OpenAI's, and it's also coherent: Mythos can conduct autonomous multi-stage attacks on arbitrary systems. It doesn't need a human driving each step. That capability is in a different risk category than a model that helps a human reverse engineer a binary. Gating it behind any identity verification regime — no matter how thorough — creates a population of credentialed accounts, each a potential leakage point. Better to keep the capability with forty organizations under contract than to release it to thousands of individuals whose accounts can be stolen.

Both bets are defensible. Neither is obviously wrong.

The Flaw Both Bets Share

The problem is that credential compromise breaks both access models, and credential compromise is what sophisticated attackers are good at.

OpenAI's TAC program verifies identity, not intent. A nation-state intelligence service can construct a convincing cover organization — a fake penetration testing firm, a fraudulent research institute — that passes any reasonable commercial identity check. That's not hypothetical; constructing credible cover organizations for technology access has been documented in multiple indictments. More practically, legitimate TAC accounts will get phished, stolen via credential stuffing, or betrayed by insiders. Once a TAC credential circulates on underground forums, the "thousands of authenticated defenders" becomes "thousands of authenticated defenders plus everyone who bought the credential." The lowered refusal boundary is now available to them too.

Anthropic's Project Glasswing has a different shape of the same problem. All forty partner organizations are large, well-resourced institutions — exactly the kind of organizations that state-sponsored attackers specifically target with patience and resources. Midnight Blizzard spent months inside Microsoft's corporate email before being detected. The 2024 Snowflake credential campaign compromised over 160 organizations, several of them large technology firms. These aren't soft targets — they're the most hardened organizations on the internet — but they are persistent targets, and persistent targeting eventually succeeds. The contractual controls Anthropic has put in place assume that these organizations can keep their Mythos access isolated from network-level attackers who have both the motivation and the patience to pursue it specifically.

I'm not claiming either access control fails immediately or trivially. I'm saying that both systems use mechanisms — identity checks, contracts, network segmentation — that we know can fail against motivated nation-state actors. And the asset they're protecting is, in Anthropic's own framing, capable of finding and exploiting critical vulnerabilities across every major operating system.

Meanwhile, the Criminals Aren't Waiting for the Policy Debate

While OpenAI and Anthropic work through their access theories, the criminal ecosystem has already registered a different observation: "AI security tool" is now a brand worth impersonating.

Researchers at Malwarebytes documented a fake Anthropic website — close enough in visual design to pass casual inspection — serving a trojanized Claude installer. The ZIP archive contains an MSI that installs genuine Claude (to keep up appearances) while deploying PlugX malware in the background. The technique is DLL sideloading, abusing a legitimate G DATA antivirus updater called NOVUpdate.exe. When that updater runs, it loads avk.dll from its own directory. The fake installer has replaced that DLL with a malicious version that decrypts and executes an encrypted payload file (NOVUpdate.exe.dat). The malware established a connection to its command-and-control server at 8.217.190[.]58, an Alibaba Cloud address, within 22 seconds of execution.

PlugX has a long history with Chinese state-sponsored espionage campaigns, though its source code circulated widely enough years ago that attribution now requires more than tool identification. Whatever the operator, the choice of a Claude impersonation tells you something real: people who download AI tools are modeled as high-value targets — either technically sophisticated users with privileged access, or corporate knowledge workers installing AI assistants for their day jobs. Both are worth the effort.

The same week, the CPUID website — developer of CPU-Z and HWMonitor, the system monitoring utilities that live on most enthusiast Windows machines — was compromised to serve a trojanized CPU-Z 2.19 installer. The payload was STX RAT, a new remote access trojan with infostealer capabilities, delivered via a Zig-compiled malicious CRYPTBASE.dll that uses IPv6-encoded .NET deserialization to stay off detection engines. Neither attack involves the AI models themselves. But both are part of the same shift: security tools and AI tools have become the most lucrative software supply chain targets of 2026, because the people who use them tend to sit on privileged access.

The two threat surfaces are converging. OpenAI and Anthropic are debating who gets verified access to security AI. Attackers are distributing malware through fake AI installers to the same population of users. The policy debate and the criminal activity are targeting the same demographic.

What to Do

If you're a security professional who wants GPT-5.4-Cyber access through the TAC program:

  1. Apply through the official route only: individuals at chatgpt.com/cyber, enterprises through your OpenAI account rep. Third-party "accelerated access" offers are social engineering.
  2. Treat your TAC credential like a privileged API key. It should live in a secrets manager, not in a .env file, a notes app, or a browser's saved passwords.
  3. Understand that the lowered refusal boundary is tied to the credential. If it leaks, someone else inherits your access profile. Rotate it if you have any reason to believe your workstation was compromised.

For IT and procurement teams:

  1. Any AI software — including Claude, ChatGPT apps, and any "AI-powered security tools" — should go through the same software approval and installation vetting process as any other application. "Download it from the website" is not a sufficient procurement process for software that runs in your environment.
  2. Verify installer signatures before running them. The fake Claude installer and the compromised CPU-Z installer both used legitimate signed executables alongside malicious payloads. The signature on the signed binary does not cover the malicious DLL loaded alongside it.
  3. Check source URLs carefully. Claude is distributed through claude.ai and the official Claude desktop app. HWMonitor and CPU-Z come from cpuid.com. If someone in your org downloaded these from a URL found on a forum, a search engine ad, or a social media post, investigate before trusting the install.

For anyone in a position to think about AI safety policy:

The access control debate between OpenAI's "verified thousands" and Anthropic's "forty trusted organizations" is genuinely interesting and worth watching. But neither approach scales to the actual threat model. Both assume that the perimeter around the capability can hold. The fake Claude installer suggests that even users who are trying to get the legitimate product can end up with malware — the perimeter around the user base is already porous.

The Bigger Problem

I have genuine respect for the engineers and policy teams at both companies working through these tradeoffs. The decisions are hard, the stakes are real, and I don't think there's an obviously correct answer.

But here's what bothers me: we have spent years building AI systems that are now better than human experts at finding critical vulnerabilities in production software. Our collective institutional response to the question of what to do with that capability is a combination of email verification, NDAs with large corporations, and hoping that forty carefully-chosen partners don't get breached by the kind of attackers who specifically target large corporations.

Mythos will eventually inform future Claude models. The TAC program will expand. Competitors will build their own versions, some with fewer restrictions. Other AI labs outside the US are building similar capabilities without the same safety culture. The access control measures are a pause — a responsible pause, probably a necessary one — while everyone tries to figure out what comes after.

What I don't hear is a coherent answer to the question of what "after" looks like. How does verified defender access work when nation-states have demonstrated they can maintain persistent access inside the largest technology companies for months without detection? What does Project Glasswing's partner model look like when Mythos-class capabilities are available from five different providers, three of which are based in jurisdictions with no equivalent to Anthropic's safety commitments?

I don't have good answers to those questions. I'm not sure the people who made this week's access decisions do either. But those are the questions the next year of AI security policy will have to grapple with, and right now the debate is mostly happening about whether to let individual researchers verify their email address first.

Sources: The Hacker News — GPT-5.4-Cyber, OpenAI TAC Program, Help Net Security — GPT-5.4-Cyber, TechCrunch — Anthropic Mythos, SecurityWeek — Anthropic Mythos, Security Affairs — Fake Claude Installer, The Hacker News — CPUID STX RAT

▸ TAGS
#openai#anthropic#gpt-5#claude-mythos#ai-security#access-control#dual-use#project-glasswing
▸ STAY IN THE LOOP

Weekly. No spam. No fluff.