Cloudflare's Project Glasswing Tests Anthropic's Mythos Preview LLM for Vulnerability Research

Cloudflare has published the results of Project Glasswing, an internal initiative that tested Anthropic's Mythos Preview LLM for vulnerability research. Over several months, Cloudflare pointed the model at more than fifty of its own repositories to evaluate its ability to find and exploit security flaws. The findings, detailed in a blog post, show that Mythos Preview represents a significant leap forward in AI-assisted vulnerability research, particularly in its ability to chain multiple low-severity bugs into working exploits and autonomously generate proofs of concept.

Mythos Preview excels at exploit chain construction, a task that traditionally requires a senior security researcher. The model can take several attack primitives—such as a use-after-free bug, an arbitrary read/write primitive, and control-flow hijacking—and reason about how to combine them into a working exploit. Cloudflare noted that the reasoning displayed by the model resembles that of an experienced human researcher rather than the output of an automated scanner. This capability allows the model to transform bugs that would typically sit invisible in a backlog into severe, exploitable vulnerabilities.

Another standout feature is proof generation. Mythos Preview writes code to trigger a suspected bug, compiles it in a scratch environment, and runs it. If the program behaves as expected, the model has a proof. If not, it reads the failure, adjusts its hypothesis, and tries again. This iterative loop closes the gap between finding a bug and proving it is exploitable, a distinction that Cloudflare emphasizes is critical for effective vulnerability triage. While other frontier models could identify interesting bugs and describe why they mattered, they often stopped short of completing the exploit chain, leaving the question of exploitability open.

Cloudflare also observed that Mythos Preview, provided by Anthropic without the additional safeguards present in generally available models like Opus 4.7 or GPT-5.5, exhibited organic refusals on certain requests. However, these guardrails were inconsistent. The same task, framed differently or presented in a different context, could produce completely different outcomes. For example, the model initially refused to perform vulnerability research on a project but later agreed after an unrelated change to the project's environment. In another case, it found and confirmed serious memory bugs but then refused to write a demonstration exploit. Cloudflare warns that while these organic refusals are real, they are not consistent enough to serve as a complete safety boundary on their own, underscoring the need for additional safeguards in any future generally available cyber frontier model.

The signal-to-noise problem in vulnerability triage is exacerbated by AI-generated reports, and Cloudflare has built multiple post-validation stages to address it. Two factors dominate the noise rate: programming language and model bias. C and C++ projects produce more false positives due to memory-unsafe bug classes, while the model's own biases can lead to inconsistent results. Despite these challenges, Cloudflare's experience with Mythos Preview suggests that AI-assisted vulnerability research is evolving rapidly, with the model's ability to chain primitives and produce verifiable exploits marking a clear departure from earlier tools.

Project Glasswing highlights both the promise and the risks of deploying advanced LLMs for cybersecurity. The ability to autonomously discover and exploit vulnerabilities could dramatically accelerate defensive efforts, but it also raises concerns about misuse by attackers. Cloudflare's findings serve as a call to action for the industry to develop robust safety frameworks that can keep pace with the capabilities of models like Mythos Preview, ensuring that they are used responsibly in controlled research contexts before broader deployment.