Critical "Bleeding Llama" Vulnerability Exposes Ollama Servers to Memory Leaks

A critical out-of-bounds read vulnerability in the Ollama framework, tracked as CVE-2026-7482, allows remote, unauthenticated attackers to leak sensitive process memory from affected servers. Researchers at Cyera, who discovered the flaw and codenamed it "Bleeding Llama," estimate that over 300,000 servers globally may be exposed to the risk The Hacker News.

The vulnerability resides in the GGUF model loader within Ollama versions prior to 0.17.1. Specifically, the flaw occurs during the quantization process in fs/ggml/gguf.go and server/quantization.go. By sending a specially crafted GGUF file to the /api/create endpoint with inflated tensor offsets and sizes, an attacker can force the server to read past its allocated heap buffer. This is made possible by the framework's use of the unsafe package, which bypasses standard memory safety guarantees The Hacker News.

An attacker can exploit this in a three-step chain: first, uploading the malicious GGUF file via an HTTP POST request; second, triggering the out-of-bounds read through the /api/create endpoint; and third, exfiltrating the leaked heap data—which may include API keys, environment variables, system prompts, and user conversation data—by using the /api/push endpoint to send the artifact to an attacker-controlled registry The Hacker News.

The potential impact is significant, as Ollama is frequently integrated with development tools like Claude Code, meaning sensitive tool outputs and proprietary code stored in the heap could be exposed. Cyera researcher Dor Attias warned that attackers could gain deep insights into an organization's AI inference operations, including customer contracts and internal credentials The Hacker News.

To mitigate this risk, users are urged to update to version 0.17.1 or later. Because Ollama's REST API does not provide built-in authentication, experts also recommend isolating instances behind firewalls, auditing for internet exposure, and deploying an authentication proxy or API gateway to restrict access The Hacker News.

Separately, researchers at Striga have disclosed two additional unpatched vulnerabilities in Ollama's Windows update mechanism. These flaws, which allow for persistent code execution, were published after a 90-day disclosure period elapsed without a fix. The Windows client, which defaults to listening on 127.0.0.1:11434, remains susceptible to these chained issues The Hacker News.

The discovery of these vulnerabilities highlights the growing security surface of local LLM frameworks as they see wider adoption in enterprise environments. With over 171,000 stars on GitHub, Ollama's popularity makes it an attractive target for attackers looking to harvest sensitive data from AI-integrated workflows The Hacker News.