LMDeploy CVE-2026-33626 Flaw Exploited Within 13 Hours of Disclosure

A high-severity security flaw in LMDeploy, an open-source toolkit for compressing, deploying, and serving large language models (LLMs), has come under active exploitation in the wild less than 13 hours after its public disclosure.

The vulnerability, tracked as CVE-2026-33626 (CVSS score: 7.5), relates to a Server-Side Request Forgery (SSRF) vulnerability that could be exploited to access sensitive data. According to an advisory published by the project maintainers, the flaw resides in the load_image() function in lmdeploy/vl/utils.py, which fetches arbitrary URLs without validating internal or private IP addresses. This allows attackers to access cloud metadata services, internal networks, and other sensitive resources.

The shortcoming affects all versions of the toolkit (0.12.0 and prior) with vision language support. Orca Security researcher Igor Stepansky has been credited with discovering and reporting the bug. Successful exploitation could permit an attacker to steal cloud credentials, reach internal services not exposed to the internet, port scan internal networks, and create lateral movement opportunities.

Cloud security firm Sysdig, in an analysis published this week, said it detected the first LMDeploy exploitation attempt against its honeypot systems within 12 hours and 31 minutes of the vulnerability being published on GitHub. The exploitation attempt originated from the IP address 103.116.72[.]119. The attacker did not simply validate the bug and move on. Instead, over a single eight-minute session, they used the vision-language image loader as a generic HTTP SSRF primitive to port-scan the internal network behind the model server: AWS Instance Metadata Service (IMDS), Redis, MySQL, a secondary HTTP administrative interface, and an out-of-band (OOB) DNS exfiltration endpoint.

The actions undertaken by the adversary, detected on Apr 22, 2026, at 03:35 a.m. UTC, unfolded over 10 distinct requests across three phases. The requests switched between vision language models (VLMs) such as internlm-xcomposer2 and OpenGVLab/InternVL2-8B to likely avoid raising suspicion. The phases included targeting AWS IMDS and Redis instances on the server, testing egress with an out-of-band DNS callback to requestrepo[.]com to confirm the SSRF vulnerability could reach arbitrary external hosts, followed by enumerating the API surface, and finally port scanning the loopback interface.

The findings are yet another reminder of how threat actors are closely watching new vulnerability disclosures and exploiting them before downstream users can apply the fixes, even in cases where no proof-of-concept (PoC) exploits exist at the time of the attack. Sysdig noted that CVE-2026-33626 fits a pattern observed repeatedly in the AI-infrastructure space over the past six months: critical vulnerabilities in inference servers, model gateways, and agent orchestration tools are being weaponized within hours of advisory publication, regardless of the size or extent of their install base. Generative AI is accelerating this collapse, as an advisory as specific as GHSA-6w67-hwm5-92mq, which includes the affected file, parameter name, root-cause explanation, and sample vulnerable code, is effectively an input prompt for any commercial LLM to generate a potential exploit.

This incident underscores the urgent need for organizations using LMDeploy to immediately apply patches and restrict network access to model servers. The rapid exploitation timeline—under 13 hours—leaves virtually no window for manual patching, emphasizing the importance of automated vulnerability management and network segmentation for AI infrastructure.