VYPR
High severityNVD Advisory· Published Aug 21, 2025· Updated Aug 21, 2025

vLLM API endpoints vulnerable to Denial of Service Attacks

CVE-2025-48956

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
vllmPyPI
>= 0.1.0, < 0.10.1.10.10.1.1

Affected products

1

Patches

1
d8b736f913a5

Limit HTTP header count and size (#23267)

https://github.com/vllm-project/vllmRussell BryantAug 20, 2025via ghsa
4 files changed · +41 0
  • vllm/entrypoints/constants.py+10 0 added
    @@ -0,0 +1,10 @@
    +# SPDX-License-Identifier: Apache-2.0
    +# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
    +"""
    +Shared constants for vLLM entrypoints.
    +"""
    +
    +# HTTP header limits for h11 parser
    +# These constants help mitigate header abuse attacks
    +H11_MAX_INCOMPLETE_EVENT_SIZE_DEFAULT = 4194304  # 4 MB
    +H11_MAX_HEADER_COUNT_DEFAULT = 256
    
  • vllm/entrypoints/launcher.py+21 0 modified
    @@ -14,6 +14,8 @@
     from vllm.engine.async_llm_engine import AsyncEngineDeadError
     from vllm.engine.multiprocessing import MQEngineDeadError
     from vllm.engine.protocol import EngineClient
    +from vllm.entrypoints.constants import (H11_MAX_HEADER_COUNT_DEFAULT,
    +                                        H11_MAX_INCOMPLETE_EVENT_SIZE_DEFAULT)
     from vllm.entrypoints.ssl import SSLCertRefresher
     from vllm.logger import init_logger
     from vllm.utils import find_process_using_port
    @@ -26,6 +28,11 @@ async def serve_http(app: FastAPI,
                          sock: Optional[socket.socket],
                          enable_ssl_refresh: bool = False,
                          **uvicorn_kwargs: Any):
    +    """
    +    Start a FastAPI app using Uvicorn, with support for custom Uvicorn config
    +    options.  Supports http header limits via h11_max_incomplete_event_size and
    +    h11_max_header_count.
    +    """
         logger.info("Available routes are:")
         for route in app.routes:
             methods = getattr(route, "methods", None)
    @@ -36,7 +43,21 @@ async def serve_http(app: FastAPI,
     
             logger.info("Route: %s, Methods: %s", path, ', '.join(methods))
     
    +    # Extract header limit options if present
    +    h11_max_incomplete_event_size = uvicorn_kwargs.pop(
    +        "h11_max_incomplete_event_size", None)
    +    h11_max_header_count = uvicorn_kwargs.pop("h11_max_header_count", None)
    +
    +    # Set safe defaults if not provided
    +    if h11_max_incomplete_event_size is None:
    +        h11_max_incomplete_event_size = H11_MAX_INCOMPLETE_EVENT_SIZE_DEFAULT
    +    if h11_max_header_count is None:
    +        h11_max_header_count = H11_MAX_HEADER_COUNT_DEFAULT
    +
         config = uvicorn.Config(app, **uvicorn_kwargs)
    +    # Set header limits
    +    config.h11_max_incomplete_event_size = h11_max_incomplete_event_size
    +    config.h11_max_header_count = h11_max_header_count
         config.load()
         server = uvicorn.Server(config)
         _add_shutdown_handlers(app, server)
    
  • vllm/entrypoints/openai/api_server.py+2 0 modified
    @@ -1894,6 +1894,8 @@ async def run_server_worker(listen_address,
                 ssl_certfile=args.ssl_certfile,
                 ssl_ca_certs=args.ssl_ca_certs,
                 ssl_cert_reqs=args.ssl_cert_reqs,
    +            h11_max_incomplete_event_size=args.h11_max_incomplete_event_size,
    +            h11_max_header_count=args.h11_max_header_count,
                 **uvicorn_kwargs,
             )
     
    
  • vllm/entrypoints/openai/cli_args.py+8 0 modified
    @@ -20,6 +20,8 @@
     from vllm.engine.arg_utils import AsyncEngineArgs, optional_type
     from vllm.entrypoints.chat_utils import (ChatTemplateContentFormatOption,
                                              validate_chat_template)
    +from vllm.entrypoints.constants import (H11_MAX_HEADER_COUNT_DEFAULT,
    +                                        H11_MAX_INCOMPLETE_EVENT_SIZE_DEFAULT)
     from vllm.entrypoints.openai.serving_models import LoRAModulePath
     from vllm.entrypoints.openai.tool_parsers import ToolParserManager
     from vllm.logger import init_logger
    @@ -172,6 +174,12 @@ class FrontendArgs:
         enable_log_outputs: bool = False
         """If set to True, enable logging of model outputs (generations) 
         in addition to the input logging that is enabled by default."""
    +    h11_max_incomplete_event_size: int = H11_MAX_INCOMPLETE_EVENT_SIZE_DEFAULT
    +    """Maximum size (bytes) of an incomplete HTTP event (header or body) for
    +    h11 parser. Helps mitigate header abuse. Default: 4194304 (4 MB)."""
    +    h11_max_header_count: int = H11_MAX_HEADER_COUNT_DEFAULT
    +    """Maximum number of HTTP headers allowed in a request for h11 parser.
    +    Helps mitigate header abuse. Default: 256."""
     
         @staticmethod
         def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser:
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

5

News mentions

0

No linked articles in our index yet.