AI Integration Creates High-Risk Vulnerabilities Fixed Slower Than Average, Cobalt Report Finds
A new report from Cobalt highlights that AI and LLM features in products introduce high-risk vulnerabilities that are resolved at a significantly slower pace than other security issues.

Companies are rapidly integrating Artificial Intelligence (AI) and Large Language Model (LLM) features into their products, but this innovation is coming with a substantial security cost. A new report from Cobalt, the AI and Pentesting Pulse Report 2026, reveals that vulnerabilities stemming from these AI integrations are rated as high-risk more frequently and are fixed at a slower rate compared to conventional software flaws.
The report, which analyzes five years of penetration testing data and surveyed 455 security leaders and practitioners, found that AI applications introduce a new layer of weaknesses on top of existing ones. A web application incorporating an LLM, for instance, remains susceptible to traditional attacks like SQL injection and cross-site scripting, while also becoming vulnerable to new threats such as prompt injection, insecure output handling, and model-level denial-of-service attacks. Consequently, the high-risk rate for AI and LLM-related penetration tests is 2.7 times higher than for other system types, a disparity that has persisted for two years, with approximately one in three AI findings earning a high-risk label, compared to about one in eight for other systems.
Beyond the discovery of these vulnerabilities, the challenge intensifies when it comes to remediation. AI and LLM pentests exhibit the lowest resolution rate among all asset classes tested by Cobalt, standing at just 38.4% in 2026. This means two out of every three serious findings remain open and exploitable. While this rate has nearly doubled over the past year, indicating some progress, it still lags significantly behind other categories like APIs and web applications, where most critical issues are resolved.
Several factors contribute to this slow remediation pace. A primary issue is the scarcity of professionals possessing expertise in both security and AI systems. Furthermore, when a flaw resides within the AI model itself, the fix often depends on the model vendor, introducing external dependencies. The nascent stage of most AI projects also means that security processes are still maturing, leading to longer resolution times. The median time to close an AI-related finding has nearly doubled, suggesting that teams are tackling more complex issues that require extensive investigation.
"Shadow AI," the use of unapproved AI tools by employees, has emerged as the most common cause of AI security incidents, accounting for 44% of confirmed incidents. Sensitive data is often exposed when employees use these tools without organizational oversight. Traditional asset inventory methods are ineffective against shadow AI, as it operates at the application layer and bypasses conventional network monitoring. Joe Brinkley, Director of Offensive Security Research and Community at Cobalt, emphasizes the need to monitor data, traffic, and endpoints, shifting focus from infrastructure to data behavior and telemetry to detect unauthorized AI usage.
Organizations are also re-evaluating their reliance on fully automated testing. Enthusiasm for letting automated tools handle all testing has waned, with the percentage of teams comfortable with this approach dropping from nearly a third to just 9%. This shift is driven by the observation that automated scanners frequently miss critical vulnerabilities, with 78% of teams reporting this issue. The preferred approach now involves a hybrid model, where automation handles routine checks on lower-risk systems, and human experts focus on critical assets.
Discrepancies in perception between security leaders and practitioners regarding remediation deadlines are also notable. While a majority of leaders report meeting their Service Level Agreements (SLAs), only about one in seven practitioners agrees. This gap often stems from differing metrics; leadership dashboards may show compliance scores, while engineers manage backlogs of alerts. Organizations that have successfully bridged this gap focus on reachability and exploit validation, filtering out theoretical risks and integrating validated findings directly into developer workflows, leading to a 4.5 times higher likelihood of meeting SLAs.
The findings underscore a growing security debt associated with the rapid adoption of AI. As AI capabilities become more integrated into software products, addressing the unique and persistent vulnerabilities they introduce will be crucial for maintaining robust cybersecurity postures.