Critical 'Bleeding Llama' Vulnerability Exposes 300,000 Ollama Deployments

A critical security vulnerability in Ollama, an open-source engine for running large language models (LLMs), is currently exposing approximately 300,000 deployments to potential information theft. The flaw, identified as a heap out-of-bounds read, allows unauthenticated remote attackers to access sensitive data residing in the application's memory SecurityWeek.

The vulnerability, tracked as CVE-2026-7482 and dubbed "Bleeding Llama," carries a critical CVSS score of 9.3 SecurityWeek. It resides within the GGUF model loader component. An attacker can exploit this by providing a malicious GGUF file that specifies a tensor offset and size exceeding the actual length of the file. When the loader processes this malformed file, it reads past the allocated heap buffer, inadvertently capturing sensitive information stored in memory, such as API keys, tokens, environment variables, prompts, and user messages SecurityWeek.

The exploitation process is notably straightforward, requiring only three unauthenticated API calls to execute. After triggering the out-of-bounds read, an attacker can leverage Ollama’s built-in model push functionality to exfiltrate the captured heap data to a server under their control SecurityWeek. Because Ollama deployments often default to listening on all network interfaces without requiring authentication, any instance exposed to the public internet is considered immediately vulnerable SecurityWeek.

The potential impact of this vulnerability is significant, as successful exploitation could lead to the exposure of proprietary development code, sensitive employee interactions, and data containing personally identifiable information (PII) or protected health information (PHI) SecurityWeek. Cyera, the security firm that discovered the flaw, warns that any deployment lacking an authentication proxy or firewall is at high risk SecurityWeek.

Ollama has addressed the issue in version 0.17.1, and administrators are urged to update their instances immediately SecurityWeek. Beyond patching, security teams are advised to restrict network access to Ollama servers, implement authentication proxies, and perform network segmentation to prevent unauthorized access. Organizations should also conduct audits of their internet-facing instances and treat any exposed server as potentially compromised, including the environment variables and data that have passed through it SecurityWeek.

This incident highlights the growing security challenges associated with the rapid adoption of self-hosted AI infrastructure. As organizations increasingly deploy LLM inference engines to handle sensitive internal data, the default configurations of these tools—often optimized for ease of use rather than hardened security—create significant attack surfaces. The "Bleeding Llama" vulnerability serves as a reminder that AI-specific components require the same rigorous security scrutiny and network-level protections as traditional enterprise software.