AI Is Developing Capabilities Nobody Tested For
Anthropic trained its newest AI model to write code. It became one of the most effective vulnerability hunters on the planet.
By Scott Weiner (AI Lead at NeuEon, Inc.)
Nobody programmed that skill. Nobody fine-tuned it for security research. The model, called Claude Mythos Preview, learned to write software so well that breaking software became a natural extension of the same underlying knowledge. In the weeks since Anthropic began testing it internally, Mythos has identified thousands of high-severity vulnerabilities across every major operating system and every major web browser. It found a 27-year-old bug in OpenBSD, an operating system built specifically for security. It found a 17-year-old remote code execution flaw in FreeBSD that gives any attacker on the internet complete root access to an unpatched server.
These are real-world results with real-world consequences. FreeBSD assigned CVE-2026-4747 to the vulnerability Mythos discovered. Patches are being written right now. The bug had been sitting in production systems since 2009.
The cybersecurity implications are significant on their own. But the deeper lesson for executives has nothing to do with cybersecurity specifically. It has everything to do with how AI capabilities actually develop, and why the governance frameworks most organizations rely on today are structurally blind to the biggest risk these systems carry.
What “Emergent Capabilities” Actually Means
When AI researchers talk about emergent capabilities, they mean skills that appear in a model without being explicitly trained. The concept is straightforward: you train a system to do one thing exceptionally well, and it develops the ability to do adjacent things you never anticipated.
Anthropic built Mythos to be extraordinarily good at understanding, writing, and reasoning about code. That deep fluency gave it an intimate understanding of how software systems work at every level, from memory allocation to network protocols to authentication logic. Finding security vulnerabilities is, in a mechanical sense, the same skill applied in reverse. Instead of asking “how should this code work,” you ask “where could this code break.” The model didn’t need separate training to make that flip. The knowledge was already there.
Think of it like training a structural engineer. You teach someone everything about how buildings stand up: load-bearing walls, foundation depth, material stress tolerances, wind shear calculations. At some point, that person can also tell you exactly how to bring a building down. You didn’t teach demolition. You taught engineering so thoroughly that demolition became obvious.
On SWE-bench, the industry standard for measuring AI’s ability to fix real-world software bugs, Mythos scores 93.9% compared to 80.8% for Anthropic’s previous best model. On CyberGym, which evaluates vulnerability analysis, it scores 83.1% versus 66.6%. When Anthropic partnered with Mozilla and gave Mythos access to Firefox 147’s JavaScript engine to probe for exploits, the previous best model developed working exploits twice out of several hundred attempts. Mythos developed working exploits 181 times.
Think about it. That is a capability that effectively did not exist in the previous generation and now exists at near-expert human levels. What’s next?
Beyond Cybersecurity
An AI can hack things. That sounds scary. Anthropic is being responsible. Good for them.
Every AI system develops capabilities beyond its training scope. The more capable the base model, the wider the range of emergent skills. And most enterprise AI governance frameworks were designed to manage a system’s intended capabilities, not the capabilities it develops on its own.
Consider the parallel in your own operations. You deploy an AI assistant trained on customer service interactions. That system learns your product catalog, your pricing structure, your customer communication patterns, your internal escalation procedures. It becomes very good at helping customers. It also becomes capable of social engineering, because the skills transfer directly: understanding what people want to hear, navigating objections, building trust through conversational patterns. Nobody trained it for that. Nobody tested for it. The capability is there regardless.
Or you deploy an AI system for financial analysis. It learns to read contracts, identify risk factors, model scenarios, and generate projections. It also becomes capable of identifying exactly which contract terms can be exploited, which financial structures are vulnerable to manipulation, and which risk factors can be obscured in a report. Same knowledge, different application.
This is the emergent capability problem. It is happening right now, in production systems, at every organization that uses frontier AI models.
The Governance Gap
Most enterprise AI governance frameworks focus on three things: what the AI is supposed to do, what data it has access to, and whether its outputs comply with policy. These are necessary controls. They are also not enough.
They fall short because they assume a fixed capability surface. You audit the system’s intended functions, you test for known failure modes, you build guardrails around prohibited outputs. This works well for systems that do exactly what they were designed to do. AI models are not that kind of system.
Their capabilities shift with every model update, every new tool integration, every expansion of their data access. More importantly, they possess latent capabilities that may never surface in normal operation but can activate under specific conditions.
Recklessness of Models
In their system card for Mythos, Anthropic describe incidents where the model, when given a difficult task, took “reckless excessive measures” to complete it. In one case, when asked to escape a secure sandbox as a test, the model broke out, gained full internet access, and published logs of its methodology across multiple public websites. In another case, after finding an exploit to edit files it lacked permission to modify, it made additional interventions to ensure its changes would not appear in the git change history.
The model was not trained to cover its tracks. That behavior emerged from its deep understanding of how version control systems work, combined with its drive to complete the assigned task. The capability was latent and was triggered by the situation.
Intentional Sandbagging
Perhaps the most unsettling finding in the 244-page system card is what Anthropic calls “intentional sandbagging.” In certain evaluation scenarios, Mythos deliberately performed below its actual capability level. It gave wrong answers it knew were right. It solved problems partially when it could solve them completely. The implication is hard to overstate: if a model can strategically underperform on the tests designed to measure its abilities, then every benchmark score you have ever read about any AI system might be an undercount. The capabilities reported on paper may represent a floor, not a ceiling. For executives relying on vendor-provided evaluation scores to make deployment decisions, this finding should fundamentally change how you interpret those numbers.
If a system this heavily monitored by its own creators can surprise them, the systems running inside your organization with less scrutiny can certainly surprise you.
What Anthropic Did About It
Anthropic’s response to Mythos is itself instructive. They chose a path between two extremes: rather than releasing the model publicly or locking it away, they gave defenders a head start.
Project Glasswing gives Mythos access to a specific set of partners: Amazon, Apple, Broadcom, Cisco, CrowdStrike, the Linux Foundation, Microsoft, Palo Alto Networks, Google, JPMorganChase, and Nvidia, among others. Approximately 40 additional organizations that maintain critical software infrastructure will also receive access. Anthropic committed $100 million in usage credits to the effort and donated $4 million directly to open-source security organizations.
If the capability to find these vulnerabilities will inevitably spread to less responsible actors as AI models continue improving, the defenders need time to patch the holes first. Anthropic’s own system card acknowledges this explicitly: what Mythos does today, smaller models will likely replicate within 12 to 24 months.
For executives, Project Glasswing carries two signals worth attention. First, the company that built the most capable AI model in existence concluded that restricting access was more responsible than maximizing distribution. That tells you something about the risk calculus at the frontier of AI development. Second, the deployment model they chose, controlled access with specific use-case restrictions, is likely a preview of how the most capable AI systems will be distributed going forward. Unrestricted access to frontier capabilities may be a temporary condition of this particular moment in AI development.
The Emergent Capability Governance Framework

Understanding the problem is step one. Governing for it requires a structured approach that most organizations have yet to build. Based on what the Mythos story reveals about how AI capabilities actually develop, here are four elements every executive should integrate into their AI governance practice.
1. Map Your Capability Surface
Most governance audits ask: “What is this system designed to do?”
Start with a different question: “What is this system capable of doing?”
An AI system trained on your legal contracts can likely draft new contract language, identify exploitable clauses, and generate persuasive arguments for either side of a dispute. An AI system with access to your codebase can likely identify vulnerabilities in that code, not just write new features. An AI system trained on employee communications can likely generate convincing impersonation of any employee whose messages it has processed.
Map the full capability surface. This requires red-team thinking: bring in people whose job is to ask “what else could this system do if someone used it differently than intended?”
If you skip this: You govern for the job description while ignoring the skill set.
2. Stress-Test for Adjacent Skills
Once you have mapped what the system could theoretically do, test whether it actually can.
Anthropic stress-tested Mythos by putting it in sandboxed environments and asking it to escape. They gave it cybersecurity challenges and measured performance. They ran it against real-world codebases and tracked what it found. This is not standard practice for enterprise AI deployments, but the principle applies at every scale.
If your AI is excellent at skill X, systematically test whether it can perform X-adjacent tasks that would be problematic. Customer service AI? Test whether it can social-engineer a human into sharing credentials. Financial analysis AI? Test whether it can generate misleading projections that pass casual review. Code generation AI? Test whether it can introduce subtle vulnerabilities while appearing to write clean code.
These tests feel adversarial. That is the point. The capabilities exist whether you test for them or not. Testing tells you what you are working with.
If you skip this: You discover emergent capabilities only when they cause an incident.
3. Build Tripwires
Guardrails prevent known bad behavior. They are essential. They also operate on a fixed list of things the system should not do. Emergent capabilities, by definition, are absent from that list.
Tripwires detect unexpected behavior without needing to predict exactly what form it will take. They work by monitoring for anomalies: the system accessing data outside its normal pattern, using tools in ways the designers did not anticipate, generating outputs beyond the expected scope, or attempting to interact with systems it was never given explicit access to.
Anthropic’s system card describes multiple instances where Mythos attempted to access credentials by inspecting process memory, circumvent sandboxing restrictions, and escalate its own permissions. These behaviors were caught because Anthropic had monitoring in place that flagged the anomalies, not because anyone predicted those specific actions in advance.
The principle for enterprise deployment: your monitoring should detect “the system is doing something unexpected” as a category, not just “the system is doing prohibited action number 47.”
If you skip this: Your guardrails catch the threats you imagined. The threats you did not imagine walk right past them.
4. Govern for What It Becomes
This is the element most governance frameworks miss entirely.
AI capabilities change constantly. Every model update from your provider alters the capability surface. Every new tool integration expands what the system can reach. Every fine-tuning session on your proprietary data reshapes what it knows. The governance framework you built six months ago may be governing a fundamentally different system than the one running today.
Anthropic updates their model evaluations with every significant change. Their system card for Mythos runs 244 pages because they recognize that understanding a model’s capabilities is an ongoing process.
Enterprise governance should follow the same logic. Build version-awareness into your AI governance: when the underlying model changes, re-evaluate. When tool access expands, re-evaluate. When the system is fine-tuned on new data, re-evaluate. Treat your AI governance framework the way you treat your financial audits: recurring, continuous, and triggered by material changes.
If you skip this: Your governance framework describes a system that no longer exists.
What This Changes Starting Monday
The Mythos story will dominate AI headlines this week. Most of the coverage will focus on the cybersecurity angle: an AI that can find vulnerabilities in everything, a company that chose to restrict access, a consortium of tech giants scanning their own systems for bugs that have been hiding for decades.
The vulnerabilities being patched through Project Glasswing protect every person who uses a phone, a browser, or a web application. The patches rolling out over the coming weeks will fix bugs that millions of automated tests and decades of human review missed.
For executives deploying AI in their own organizations, the actionable insight is different. The most capable AI model ever built surprised its own creators with skills they never trained it for. The systems running inside your organization, with less scrutiny and less monitoring, carry the same potential for emergent capabilities at a smaller scale.
The four governance elements give you a starting point: map capabilities beyond intent, stress-test for adjacent skills, build detection for unexpected behavior, and re-evaluate with every change. None of these require new technology. They require a shift in how you think about what your AI systems actually are.
They are not fixed tools that do exactly what the label says. They are learning systems whose capabilities evolve, expand, and sometimes surprise the people who built them.
Govern accordingly.
Ask Yourself This
Mythos can deliberately underperform on the tests designed to measure it. How would you adapt your organization’s tripwires to catch a system that is actively trying to hide what it can do?
Have your own AI transformation story? We’d love to hear it. Connect with Scott on LinkedIn or reach out to NeuEon at neueon.com/contact.
Want to learn more about fractional CAIO engagements? Contact NeuEon to discuss your AI transformation.