Security
12 min read

An AI Just Found Zero-Days Nobody Saw for 27 Years

Anthropic's unreleased Mythos Preview autonomously found and exploited zero-days in every major OS and browser — including a 27-year-old OpenBSD bug. The defenders got a head start. The rest of us should be paying attention.

Last week Anthropic published something I've been waiting to read for two years and hoping I'd never have to. Their unreleased frontier model, Claude Mythos Preview, was pointed at the software running the modern world — operating systems, browsers, cryptographic libraries, kernels — and told to find bugs. It did. Thousands of them. Most were novel. Some were critical. One of them had been sitting in OpenBSD, a project literally marketed on its security track record, for 27 years.

Then Mythos wrote the exploits.

Not proof-of-concept crashes. Actual working chains. In one case it wove four separate browser bugs into a JIT heap spray that popped both the renderer sandbox and the OS sandbox. In another it chained privilege escalations in the Linux kernel to go from unprivileged user to full root. The model did these things largely without a human at the wheel — someone handed it a harness and a target, and it came back with working code.

If you've been reading about AI and security for a while and your bullshit detector is flashing, I get it. Mine did too. I went and read the technical writeups, the government briefings, and the analyses from three different research groups. This one is real, and it's a bigger deal than the "AI writes malware" stories we've been getting for a year.

What Mythos Actually Did

The specific findings matter because they tell you what kind of capability we're now dealing with.

CVE-2026-4747 is a remote code execution bug in FreeBSD's NFS server that Mythos found and exploited autonomously. The flaw sits in kgssapi.ko, the kernel module that handles RPCSEC_GSS authentication. Specifically, svc_rpc_gss_validate() copies an attacker-controlled credential body into a 128-byte stack buffer called rpchdr[] without checking the length. It's a textbook stack overflow — the kind of bug we've been finding and patching since the 1990s — and it gave unauthenticated, remote attackers root on the server. It had been in the FreeBSD source tree for 17 years. Zero humans had found it in that time.

The oldest bug Mythos dug up was a 27-year-old flaw in OpenBSD. OpenBSD ships with a compile-time banner about its security audits. Its mailing lists are full of maintainers who read every commit. A paid team of professional auditors has made their careers on this codebase. And a language model went in and found a bug that had slipped past all of them since 1999.

Mythos also pulled a 16-year-old RCE out of FFmpeg, the media-processing library that shows up in everything from Chrome to VLC to OBS. On the browser side, it wrote that four-vulnerability JIT heap spray I mentioned. Anthropic's paper is careful not to name which browser, but every major one uses a JavaScript JIT, and every major one has a sandbox that an exploit like this breaks.

One more data point that's worth sitting with: a separate AI security startup called AISLE ran their own smaller, cheaper model against the January 2026 OpenSSL security advisory and found all 12 CVEs. Including bugs introduced in 1998. That's not a frontier-model result — that's a "the floor has moved" result.

Why The 27-Year Number Matters

When I first read "27-year-old OpenBSD bug," my gut reaction was: okay, but vulnerability research has always been uneven. Lots of bugs hide in plain sight for decades. The sudoedit flaw from 2021 was 10 years old. The polkit pwnkit bug was 12. The Log4Shell codepath had been in log4j since 2013. Old bugs happen.

The difference is the rate. OpenBSD has had tens of thousands of eyes on it for three decades. Linux has had more. OpenSSL has been audited to within an inch of its life multiple times, including by teams paid specifically to find bugs before the ones that showed up in Heartbleed and Shellshock. The output of human vulnerability research against these targets is well-characterized — it's slow, it's expensive, and it tends to find one or two notable issues per year per project.

Mythos found thousands of novel vulnerabilities across these projects in a single run. That's not "faster than humans." That's a different category of activity. The thing that's changed is that the cost of running a competent auditor against a codebase has collapsed from "hire a specialist firm and wait six months" to "rent GPU time for a weekend."

This is what the researchers at AISLE have been calling the "jagged frontier" — AI capability that's wildly uneven across tasks, but when it lands in a specific place, it lands hard. Mythos happens to have landed on one of the tasks we'd least want it to.

Project Glasswing: Getting the Defenders There First

Anthropic did not release Mythos Preview. That's the part of this story I want to make sure you notice.

Instead, they launched something called Project Glasswing. The short version: if you're a defender working on critical software, you get access to Mythos through Anthropic with $100 million in usage credits, plus $4 million in open-source donations to help fix the stuff it finds. If you're not on the list, you don't get access.

The list includes the people you'd expect — Microsoft, Amazon, Google, Apple, Cisco, NVIDIA, the Linux Foundation — and some that are more interesting, like JPMorganChase. Anthropic's framing is straightforward: offensive capability is coming whether we like it or not, and the window where defenders can still get ahead is measured in months, not years. Give defenders the head start.

I think this is the right call. I also think it has consequences that are worth being honest about.

The first consequence is that defensive security is now explicitly two-tier. If you run infrastructure for JPMorganChase, you have AI-scale auditing on your attack surface starting now. If you run infrastructure for a regional hospital, a water utility, a small city's emergency services, or an election board — you do not. The old equilibrium, where the defender and the attacker both had access to roughly the same pool of human research talent, is gone. The attackers' side will catch up (more on that in a second); the defenders' side requires a plane ticket to a Glasswing partner meeting.

The second consequence is that "it's open-source so many eyes make bugs shallow" is a security argument from a previous era. It was always a cope — Heartbleed lived in plain sight in OpenSSL for two years — but with Mythos-class auditing, open-source repos are not harder to audit than closed-source ones. Both are just text. The difference is that after Glasswing's partners get finished triaging, the findings in closed-source software stay private, and the findings in open-source software ship in commits that any attacker can diff against the previous version to reverse-engineer what just got fixed.

I don't know how the open-source ecosystem absorbs that. I'm not sure anyone does.

"When Attackers Get This, What Happens?"

The uncomfortable question that everyone is dancing around is how long Glasswing's head start actually lasts.

Anthropic has a safeguard program. Mythos Preview has specific usage restrictions. The API enforces them. Fine. None of that stops a well-funded nation-state from:

  1. Training a comparable model from scratch. The compute and data are expensive but not prohibitive for a state actor. China's AI labs have made this a strategic priority since 2023. Several have frontier-class capabilities already.
  2. Stealing one. Anthropic themselves published research in February showing Chinese AI firms had run 16 million Claude queries in what looked like a systematic model-distillation attempt. Model theft is not theoretical.
  3. Jailbreaking or fine-tuning an open-weight model that's close enough. Meta's Llama series, Alibaba's Qwen, and DeepSeek's R-class models keep closing the gap with frontier labs. A fine-tuned Qwen aimed at vulnerability research is not as capable as Mythos, but it's "good enough" for plenty of attack work.

The consensus I'm seeing across the research community is that the window is something like 12 to 24 months before capabilities comparable to Mythos are accessible to well-resourced attackers. After that, we're not in a world where defenders have a head start. We're in a world where everyone has AI-scale vulnerability research, and the attackers have the usual advantages of offense — you only have to find one bug, defenders have to fix them all.

That's the window Glasswing is trying to use.

What You Actually Do About This

Most of what I write in this section of an article is advice that individuals can act on. This one is different. A lot of the meaningful response here is organizational and policy-level, and I want to be honest about that instead of pretending there's a five-step personal playbook that fixes it. That said, there are things you can do.

If you run or maintain any software that matters, get serious about your patch pipeline. Between Mythos-style auditing on the defender side and eventual attacker-side equivalents, the number of patched vulnerabilities per month is going up and will keep going up. The bottleneck is going to shift from "find the bug" to "ship the fix." If your deployment process takes two weeks to push a security patch, that's no longer acceptable; you need to be able to push in hours. Test coverage, CI/CD, rollback tooling — this is where the budget should go.

Throw out your mental model of "unpatched legacy software is probably fine." A huge fraction of the bugs Mythos is finding are in old code. If you have an unmaintained library in your dependency tree that last saw a commit in 2019, assume it will have a disclosed exploit within the next 18 months. Replacing or vendoring-and-auditing unmaintained dependencies is now security work, not housekeeping.

If you write software in C or C++, your risk just went up. Not because the languages suddenly got worse, but because automated auditing is disproportionately effective against memory-unsafe code. The stack overflow Mythos found in FreeBSD's NFS code is a category of bug that static analysis and fuzzing have been chipping away at for 25 years. AI auditors will finish the job. Memory-safe rewrites (Rust, Go, Zig) went from "nice to have" to "the timeline just got shorter."

If you run infrastructure that depends on a specific SaaS or platform, watch their patch cadence closely. Microsoft's April 2026 Patch Tuesday fixed 167 flaws with two zero-days; Adobe shipped emergency updates for an actively exploited Acrobat bug; Google patched its fourth Chrome zero-day of 2026 in Dawn/WebGPU. These aren't outliers. This is the new rhythm. Vendors that can't keep up are going to have increasingly bad months.

At a policy level, ask your legislators and industry groups hard questions about access. The Glasswing partner list is defensible — you can't hand every company on Earth a vulnerability-finding superweapon and call it a day. But the selection logic is currently "Anthropic decides." That's fine for the first round. It is not a long-term answer. Public-interest infrastructure — hospitals, water, power, elections, courts — needs a path to comparable defensive tooling that does not depend on being commercially interesting to a frontier AI lab.

For your own systems, re-audit with modern tools. Anthropic isn't the only game in town. AISLE, a handful of startups, and the open-source SAST/DAST community are all shipping AI-augmented auditing right now. Some of it is excellent, a lot of it is marketing. The only way to tell the difference is to point it at a known-vulnerable codebase and see what it finds. If you're responsible for security at a company that sells software, this should be on your Q2 roadmap.

The Uncomfortable Part

There's a version of this story where the right response is "build the AI, give it to the good guys, defend civilization." Anthropic has clearly decided that's the version they're living in, and I think the case for Glasswing on its own terms is strong.

There's another version where you notice that the "good guys" list is a selection by a private company, that the safeguard program is based on the company's judgment about what's dangerous and what's not, that the defenders' head start is going to evaporate, and that the end-state is a world where offense against any non-frontier-aligned institution is drastically easier than it used to be. That version isn't wrong either.

What I keep coming back to is that Mythos exists. That's the dominant fact. Whatever concerns you have about access, safety, or policy, they need to take that as the starting point. The choice isn't whether AI models that can autonomously find and exploit zero-days get built — they already have been. The choice is who gets them first and what we do with the window.

Anthropic gave a lot of critical infrastructure a head start. That's better than not giving it to them. It's also not a strategy for anyone outside that list, and "anyone outside that list" is most of the things you depend on in your daily life.

If you're in a position to push your organization to act in the next 12 months, this is the time. The 27-year-old OpenBSD bug isn't the story. It's just a receipt for what's coming.

Sources: The Hacker News — Anthropic's Claude Mythos Finds Thousands of Zero-Day Flaws, Help Net Security — Anthropic's new AI model finds and exploits zero-days, Anthropic — Claude Mythos Preview (red.anthropic.com), Anthropic — Project Glasswing, The Register — Anthropic Mythos model can find and exploit 0-days, VentureBeat — Mythos autonomously exploited vulnerabilities that survived 27 years of human review, The Conversation — Claude Mythos and Project Glasswing, AISLE — AI Cybersecurity After Mythos: The Jagged Frontier

▸ TAGS
#AI#zero-day#Anthropic#Claude-Mythos#Project-Glasswing#vulnerability-research
▸ STAY IN THE LOOP

Weekly. No spam. No fluff.