Microsoft Unveils MDASH: AI Agentic System That Discovered 16 Windows Security Vulnerabilities Including Critical RCE Flaws

Listen to this Post

Featured Image

Introduction

Cybersecurity is entering a new phase where artificial intelligence is no longer just assisting analysts but actively performing deep vulnerability discovery at scale. Microsoft has introduced a major advancement in this direction with its new multi-model agentic security system, codenamed MDASH, designed to autonomously scan complex codebases like Windows and uncover exploitable flaws that traditional tools and even single AI models often miss. The system has already demonstrated its ability to detect real-world critical vulnerabilities across networking and authentication components, signaling a shift toward fully agent-driven security engineering inside enterprise environments.

Summary of the Original

Microsoft announced a major breakthrough in AI-powered cybersecurity through its new agentic security system, MDASH, which successfully discovered 16 previously unknown vulnerabilities across the Windows networking and authentication stack, including four Critical remote code execution flaws affecting components such as the Windows kernel TCP/IP stack and the IKEv2 service. The system was developed by the Microsoft Autonomous Code Security team and uses a multi-model agentic scanning harness that coordinates more than 100 specialized AI agents working together across different reasoning stages. Unlike traditional single-model AI security tools, MDASH relies on an ensemble approach where multiple AI models collaborate, debate, and validate findings to ensure accuracy and reduce false positives.

In internal testing, MDASH achieved perfect detection results by identifying all 21 deliberately injected vulnerabilities in a test driver with zero false positives. It also reached 96% recall on five years of Microsoft Security Response Center historical cases in CLFS.sys and 100% recall in tcpip.sys, showing strong consistency in real-world security analysis. On the public CyberGym benchmark containing 1,507 real vulnerability tasks, MDASH achieved a leading score of 88.45%, outperforming competing systems by a significant margin.

The system operates through a structured pipeline consisting of preparation, scanning, validation, deduplication, and proof stages. Each stage is handled by specialized AI agents, ensuring that vulnerabilities are not only detected but also verified through reasoning and exploit simulation. Microsoft emphasized that MDASH’s strength lies not in any single model but in the orchestration of many agents working together across different roles.

The article also explains that MDASH was tested on a private Windows driver called StorageDrive, which contained 21 known vulnerabilities. The system successfully detected all of them, demonstrating its reliability in unseen code environments. Following this, it was applied to the Windows TCP/IP networking stack, where it discovered multiple critical vulnerabilities affecting kernel-level networking functions, IPv6 handling, IPsec, DNS, and authentication services.

Among the 16 vulnerabilities disclosed in the latest Patch Tuesday release, several allowed remote code execution without authentication, particularly in tcpip.sys and ikeext.dll. Others included denial-of-service conditions, memory corruption issues, and privilege escalation risks. The vulnerabilities were considered highly impactful due to their reachability from network-facing components and their potential exploitation in enterprise environments.

Two highlighted cases included a race-condition use-after-free in tcpip.sys triggered by Strict Source and Record Route IPv4 packets, and a double-free vulnerability in the IKEv2 service caused by improper memory ownership handling during fragmented packet reassembly. Both issues demonstrated how complex cross-file logic and concurrency conditions contributed to security weaknesses that are difficult for single-model systems to detect.

Microsoft also reported that MDASH’s architecture allows seamless integration of plugins and domain-specific knowledge, such as kernel rules and filesystem invariants, improving its ability to reason about specialized code behavior. The system is designed to remain model-agnostic, allowing future AI improvements to be integrated without rebuilding the pipeline.

The article concludes by positioning MDASH as a production-grade advancement in AI-driven vulnerability discovery, marking a transition from experimental research to large-scale enterprise cybersecurity deployment. It also highlights that customers can join a private preview program to test the system.

What Undercode Say:

MDASH represents a structural shift in cybersecurity automation rather than just an incremental improvement in scanning tools.
The key innovation is not the AI models themselves but the orchestration layer that coordinates reasoning across more than 100 specialized agents.
This reflects a broader industry trend where system design is becoming more important than raw model capability.
Traditional vulnerability scanners rely heavily on pattern matching and static heuristics, which struggle with cross-file and concurrency-based bugs.
MDASH instead introduces staged reasoning, where detection, validation, and exploitation are separated into different cognitive roles.
This separation mirrors how human security teams operate, with analysts, reviewers, and exploit developers working independently.
The ensemble model approach reduces false positives by forcing disagreement resolution between multiple AI systems.
That disagreement is not treated as noise but as a signal for deeper analysis.
One of the most significant implications is the ability to reason across large proprietary codebases like Windows.
These environments are historically difficult because they are not part of public training datasets.
The reported 96% and 100% recall rates suggest that MDASH can reconstruct known vulnerability patterns reliably.
However, recall metrics alone do not guarantee future discovery of novel exploit classes.
The real value lies in its ability to chain reasoning across memory safety, concurrency, and protocol logic.
The tcpip.sys vulnerabilities show that network stack complexity remains a high-risk attack surface.
Kernel-level bugs are especially dangerous due to privilege level and system-wide impact.
The IKEv2 double-free case highlights how subtle memory ownership errors persist in mature codebases.
Even well-reviewed systems can still contain lifecycle inconsistencies across multiple files.
MDASH’s plugin system is critical because it injects domain knowledge that general models lack.
Without such plugins, AI systems would struggle with OS-specific semantics like IRP handling or kernel locks.
The CyberGym benchmark result suggests that MDASH is not limited to Microsoft’s internal code.
Still, benchmark performance does not always translate directly into real-world exploitation success rates.
The architecture’s model-agnostic design is strategically important for long-term sustainability.
It prevents vendor lock-in to a single AI model generation.

This makes MDASH adaptable as foundation models evolve rapidly.

However, the complexity of maintaining 100+ agents introduces operational overhead and tuning challenges.
False negatives remain a concern, especially in edge-case race conditions.
The system’s reliance on staged validation helps mitigate but not eliminate this risk.
Security engineering is shifting toward probabilistic discovery rather than deterministic scanning.
This raises new questions about auditability and reproducibility of AI-generated findings.
Enterprises adopting similar systems will need strong verification pipelines like Patch Tuesday integration.
Overall, MDASH reflects a future where vulnerability discovery becomes semi-autonomous but still human-governed.

Fact Checker Results

✔ Microsoft has publicly invested heavily in AI-assisted security research initiatives.
✔ Multi-agent AI systems are increasingly used in vulnerability research and code analysis.
❌ Specific CVE outcomes and benchmark scores cannot be independently verified from this text alone.

Prediction

MDASH-like systems will likely become standard in enterprise security pipelines within the next few years, especially for large-scale operating systems and cloud platforms.
Future iterations will likely integrate real-time exploit simulation and automated patch generation, reducing the time between discovery and mitigation.
However, attackers will also adopt similar multi-agent AI frameworks, leading to an escalation in automated vulnerability discovery on both defensive and offensive sides.

🕵️‍📝Let’s dive deep and fact‑check.

References:

Reported By: www.microsoft.com
Extra Source Hub (Possible Sources for article):
https://www.twitter.com
Wikipedia
OpenAi & Undercode AI

Image Source:

Unsplash
Undercode AI DI v2
Bing

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow UndercodeNews & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin | 🦋BlueSky | 🐘Mastodon