RAXE-2026-023 HIGH CVSS 8.8 v3.1 S1

vLLM RCE via auto_map Dynamic Module Loading (CVE-2026-22807)

Adversarial ML 2026-03-09 M. Hirani TLP:GREEN

Executive Summary

What: vLLM, an open-source inference serving engine for large language models, contains a code injection vulnerability (CVE-2026-22807) in its model loading mechanism. Versions >= 0.10.1 and < 0.14.0 unconditionally process Hugging Face auto_map dynamic module entries during model initialisation without gating on the trust_remote_code configuration flag, enabling arbitrary Python code execution on the host at server startup (GHSA-2pc9-4j83-qjmr).

So What: An attacker who controls or poisons a model repository — whether on Hugging Face Hub, a compromised internal registry, or a locally mounted path — can embed malicious Python modules that execute during vLLM model resolution, before any inference request is served and without requiring API access or authentication to the vLLM instance. This represents a supply chain attack vector against AI inference infrastructure: the model artefact itself becomes the weapon. The vulnerability bypasses the trust_remote_code=False security control that operators rely on to prevent exactly this class of attack (GHSA-2pc9-4j83-qjmr).

Now What: Organisations running vLLM should upgrade to version 0.14.0 immediately. Until patched, restrict model sources to trusted, internally audited repositories. Note that setting trust_remote_code=False does not mitigate this vulnerability in affected versions — the flag is not checked before the vulnerable code path executes (GHSA-2pc9-4j83-qjmr).


Risk Rating

Dimension Rating Detail
Severity HIGH (8.8) CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H (GHSA-2pc9-4j83-qjmr)
Urgency HIGH Patch available (v0.14.0); exploitation requires operator to load attacker-controlled model (GHSA-2pc9-4j83-qjmr)
Scope UNCHANGED Exploitation affects only the vLLM host process and its privileges (NVD)
Confidence HIGH CVE assigned, GHSA published, vendor confirmed, patch released, fix commit identified (NVD, GHSA-2pc9-4j83-qjmr)
Business Impact HIGH Full compromise of inference server: arbitrary code execution with vLLM process privileges, including GPU and filesystem access (NVD)

CVSS score discrepancy: The NVD primary assessment assigns CVSS 9.8 (CRITICAL) with vector CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H, differing on the User Interaction dimension (NVD). The GHSA vector (UI:R, score 8.8) reflects the requirement for an operator to initiate loading of the malicious model. This publication uses the GHSA score of 8.8 (high-severity) as it more accurately represents the attack preconditions. Organisations that automate model loading from external sources without human review should consider the NVD score of 9.8 as more applicable to their deployment context.


Affected Products

Product Registry Affected Versions Fixed Version Source
vllm PyPI >= 0.10.1, < 0.14.0 0.14.0 GHSA-2pc9-4j83-qjmr

Am I Affected?

  • Check if vLLM is deployed in your environment: pip show vllm or inspect container images used for LLM inference serving
  • Verify the installed version: any release from 0.10.1 through 0.13.x is vulnerable (GHSA-2pc9-4j83-qjmr)
  • Review whether your model-loading pipeline sources models from public hubs (Hugging Face Hub), third-party repositories, or user-supplied paths
  • Determine whether model loading is automated (CI/CD pipelines, scheduled refresh) or requires manual operator action — automated loading increases effective risk
  • Check your trust_remote_code configuration setting; note that this flag does NOT protect you in affected versions (GHSA-2pc9-4j83-qjmr)

Abstract

CVE-2026-22807 is a high-severity code injection vulnerability in vLLM, the open-source inference engine for large language models. The flaw resides in vllm/model_executor/models/registry.py, where the function try_get_class_from_dynamic_module unconditionally processes auto_map entries from Hugging Face model configuration files via the Transformers library's dynamic_module_utils (GHSA-2pc9-4j83-qjmr). A secondary vulnerable code location exists at vllm/transformers_utils/dynamic_module.py:13 (GHSA-2pc9-4j83-qjmr). In affected versions (>= 0.10.1, < 0.14.0), this execution path runs regardless of whether the trust_remote_code flag is set to False, bypassing the intended security control for dynamic module loading (GHSA-2pc9-4j83-qjmr). An attacker who controls a model repository can embed arbitrary Python code in a module referenced by auto_map, which executes at import time during model initialisation — before the server begins handling inference requests. The vulnerability is classified as CWE-94 (Improper Control of Generation of Code) (NVD). The fix in version 0.14.0 gates dynamic module execution on trust_remote_code being explicitly enabled (GHSA-2pc9-4j83-qjmr). The vulnerability was reported by bugbunny.ai, with the remediation developed by DarkLight1337 and coordinated by russellb (GHSA-2pc9-4j83-qjmr).


Key Findings

  1. trust_remote_code bypass enables supply chain RCE — The core issue is that vLLM's model resolution code processes auto_map dynamic module entries without checking the trust_remote_code flag, which is the established security control in the Hugging Face ecosystem for gating dynamic code execution. Operators who set this flag to False are not protected in affected versions (GHSA-2pc9-4j83-qjmr).

  2. Pre-request execution timing — The malicious code executes during model initialisation at server startup, before any inference requests are processed. This means the compromise occurs at the infrastructure provisioning stage, not during runtime interaction, making it invisible to request-level monitoring (GHSA-2pc9-4j83-qjmr).

  3. Two vulnerable code locations identified — The advisory identifies vulnerable code at vllm/model_executor/models/registry.py:856 and vllm/transformers_utils/dynamic_module.py:13, both participating in the unsafeguarded dynamic module loading path (GHSA-2pc9-4j83-qjmr).

  4. Network-based attack via model poisoning — The attack vector is network-based (AV:N). The attacker need not interact with a running vLLM instance; they only need to control the contents of a model repository that an operator subsequently loads. This aligns with MITRE ATLAS AML.T0010 (ML Supply Chain Compromise) (NVD, RAXE assessment).

  5. Dual CVSS scores reflect deployment-dependent risk — The GHSA score of 8.8 (UI:R) reflects human-initiated model loading; the NVD score of 9.8 (UI:N) reflects automated pipelines. Organisations should assess which score applies to their deployment model (NVD, GHSA-2pc9-4j83-qjmr).


Attack Flow

+--------------------------+
|  1. MODEL POISONING      |  Attacker publishes or compromises a model
|  Repository controlled   |  repository containing a malicious auto_map
|  by adversary            |  entry in config.json + payload .py file
+-----------+--------------+  (GHSA-2pc9-4j83-qjmr)
            |
            v
+--------------------------+
|  2. MODEL LOAD TRIGGER   |  Operator or automated pipeline loads
|  Operator initiates      |  the model into a vLLM server instance
|  server with --model     |  via --model flag or API call
+-----------+--------------+  (GHSA-2pc9-4j83-qjmr, UI:R)
            |
            v
+--------------------------+
|  3. AUTO_MAP RESOLUTION  |  vLLM registry.py calls
|  trust_remote_code       |  try_get_class_from_dynamic_module()
|  flag NOT checked        |  which delegates to HF dynamic_module_utils
+-----------+--------------+  (GHSA-2pc9-4j83-qjmr)
            |
            v
+--------------------------+
|  4. DYNAMIC MODULE LOAD  |  Python interpreter imports the attacker's
|  Code executes at        |  module file; module-level code runs
|  import time             |  immediately at import time
+-----------+--------------+  (GHSA-2pc9-4j83-qjmr)
            |
            v
+--------------------------+
|  5. ARBITRARY CODE       |  Payload executes with full privileges of
|  EXECUTION               |  the vLLM process (typically GPU access,
|  Pre-request, no auth    |  filesystem, network) before serving begins
+--------------------------+  (NVD: C:H/I:H/A:H, CWE-94)

Technical Details

Vulnerability Mechanics

vLLM's model loading subsystem resolves model architectures by examining the auto_map field in a model's config.json file. The auto_map mechanism is a Hugging Face convention that allows model publishers to specify custom Python classes for model loading — for example, mapping AutoModelForCausalLM to a custom implementation in a Python module shipped alongside the model weights (GHSA-2pc9-4j83-qjmr).

In the Hugging Face Transformers library, loading code from auto_map is gated behind the trust_remote_code parameter. When set to False (the default), Transformers refuses to execute arbitrary Python modules from model repositories. This is a deliberate security boundary designed to prevent supply chain attacks through model configuration (GHSA-2pc9-4j83-qjmr).

In affected vLLM versions (>= 0.10.1, < 0.14.0), the function try_get_class_from_dynamic_module at registry.py:856 delegates to Hugging Face's dynamic_module_utils without first checking the trust_remote_code configuration flag. The dynamic module loading proceeds unconditionally, regardless of the operator's security configuration (GHSA-2pc9-4j83-qjmr).

Exploitation Mechanism

The attack requires the adversary to control the contents of a model repository. This can be achieved through several vectors:

  1. Public hub publishing — The attacker publishes a model on Hugging Face Hub with a name resembling a legitimate or popular model (typosquatting, namespace confusion).
  2. Repository compromise — The attacker gains write access to an existing model repository and injects a malicious auto_map entry and Python module.
  3. Local path substitution — The attacker substitutes a model at a locally mounted or network-accessible path used by the vLLM server.

The malicious model repository contains two key artefacts:

  • A config.json with an auto_map field mapping a class name to a dotted Python module path (e.g., "AutoModelForCausalLM": "malicious_module.BackdoorModel")
  • A Python source file at the specified module path containing the attacker's payload at module level

When vLLM loads this model, the Python interpreter imports the attacker's module. Code at module level (outside function or class definitions) executes immediately during the import. The payload runs with the full privileges of the vLLM process (GHSA-2pc9-4j83-qjmr).

CVSS Vector Analysis

The GHSA CVSS:3.1 vector AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H (score 8.8 HIGH) reflects (GHSA-2pc9-4j83-qjmr):

  • Attack Vector: Network — the attacker supplies the malicious model via a remote repository
  • Attack Complexity: Low — no special conditions beyond model repository control
  • Privileges Required: None — no credentials required on the vLLM instance
  • User Interaction: Required — an operator must initiate loading of the malicious model
  • Scope: Unchanged — impact is confined to the vLLM host process
  • Impact: High across Confidentiality, Integrity, and Availability

The NVD primary vector AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H (score 9.8 CRITICAL) differs only on the User Interaction dimension (UI:N), reflecting scenarios where model loading is fully automated (NVD).

Weakness Classification

CWE-94: Improper Control of Generation of Code ('Code Injection') (NVD). The vulnerability allows an attacker to inject and execute arbitrary code through the model configuration mechanism.

Patch Analysis

The fix was delivered in vLLM version 0.14.0 via pull request #32194 (commit 78d13ea9de4b1ce5e4d8a5af9738fea71fb024e5). The patch gates dynamic module execution on the trust_remote_code flag being explicitly enabled, restoring the intended security boundary (GHSA-2pc9-4j83-qjmr). The vulnerability was reported by bugbunny.ai, the remediation was developed by DarkLight1337, and the advisory was coordinated by russellb (GHSA-2pc9-4j83-qjmr).


Confidence & Validation

Assessment Confidence: High

Aspect Status Detail
Vendor Advisory Confirmed GHSA-2pc9-4j83-qjmr published, vendor-acknowledged (GHSA-2pc9-4j83-qjmr)
CVE Assigned Yes CVE-2026-22807, published 2026-01-21, last modified 2026-01-30 (NVD)
PoC Available Conceptual Attack mechanism described in advisory; no public exploit code at time of writing (GHSA-2pc9-4j83-qjmr)
Patch Available Yes Version 0.14.0, PR #32194, commit 78d13ea (GHSA-2pc9-4j83-qjmr)
Exploited in Wild Not known No reports of active exploitation at time of writing (NVD)
EPSS 0.056% (17th percentile) Low observed exploitation probability at time of writing (FIRST.org EPSS)

Detection Signatures (Formal Rules)

Detection Scope: The following rules are environment-dependent hunting rules, not broad signatures of compromise. YARA Rule 1 will match legitimate auto_map usage. The Sigma rule for .py downloads from Hugging Face is effective only in environments where such retrieval is anomalous. Tune thresholds and allowlists for your deployment before enabling in production.

YARA Rule 1 -- Suspicious auto_map in Model config.json

Detects model config.json files containing auto_map entries with dotted Python module paths or filesystem path separators, which are the precondition for CVE-2026-22807 exploitation (GHSA-2pc9-4j83-qjmr).

rule RAXE_2026_023_AutoMap_ConfigJson
{
    meta:
        id          = "RAXE-2026-023-001"
        description = "Detects model config.json with auto_map entries that may indicate CVE-2026-22807 exploitation precondition"
        reference   = "https://github.com/advisories/GHSA-2pc9-4j83-qjmr"
        cve         = "CVE-2026-22807"
        severity    = "HIGH"
        author      = "RAXE Labs (M. Hirani)"
        date        = "2026-03-09"
        tlp         = "TLP:GREEN"

    strings:
        $auto_map_key = { 22 61 75 74 6f 5f 6d 61 70 22 }
        $module_path = /\"[a-zA-Z_][a-zA-Z0-9_]*\.[a-zA-Z_][a-zA-Z0-9_]*\"/
        $path_separator = /\"[^\"]*[\/\\\\][^\"]*\"/

    condition:
        filesize < 2MB and
        $auto_map_key and
        ($module_path or $path_separator)
}

Tuning note: Scope to files named config.json within model directories or Hugging Face cache paths (typically ~/.cache/huggingface/hub/ on Linux). False-positive rate is moderate in environments with legitimate custom models that use auto_map; correlate with source reputation before escalating.

YARA Rule 2 -- Python Source in Model Directory with Suspicious Imports

Detects Python files co-located with model artefacts that import network, process-launching, or encoding standard library modules — consistent with payload delivery via auto_map (GHSA-2pc9-4j83-qjmr).

rule RAXE_2026_023_SuspiciousPy_In_ModelDir
{
    meta:
        id          = "RAXE-2026-023-002"
        description = "Detects Python source files with network or process-launch imports co-located with model files, consistent with CVE-2026-22807 payload delivery via auto_map"
        reference   = "https://github.com/advisories/GHSA-2pc9-4j83-qjmr"
        cve         = "CVE-2026-22807"
        severity    = "HIGH"
        author      = "RAXE Labs (M. Hirani)"
        date        = "2026-03-09"
        tlp         = "TLP:GREEN"

    strings:
        $import_subproc = { 69 6d 70 6f 72 74 20 73 75 62 70 72 6f 63 65 73 73 }
        $import_socket  = { 69 6d 70 6f 72 74 20 73 6f 63 6b 65 74 }
        $import_b64     = { 69 6d 70 6f 72 74 20 62 61 73 65 36 34 }
        $b64decode_call = { 62 36 34 64 65 63 6f 64 65 }

    condition:
        filesize < 500KB and
        2 of them
}

Tuning note: Scope to Hugging Face cache directories and vLLM model staging paths only. Python files in site-packages are expected and should be excluded. Flag only those co-located with config.json model files.

Sigma Rule 1 -- Unexpected Child Process from vLLM Server

Detects unexpected child processes spawned by a Python process running vLLM during model initialisation, which may indicate exploitation of CVE-2026-22807 (GHSA-2pc9-4j83-qjmr).

title: Unexpected Child Process Spawned During vLLM Model Initialisation
id: raxe-2026-023-003
status: experimental
description: >
  Detects unexpected child processes spawned by a Python process running vLLM
  during model initialisation, which may indicate exploitation of CVE-2026-22807
  (vLLM RCE via auto_map dynamic module loading, GHSA-2pc9-4j83-qjmr).
references:
  - https://github.com/advisories/GHSA-2pc9-4j83-qjmr
  - https://nvd.nist.gov/vuln/detail/CVE-2026-22807
tags:
  - attack.execution
  - attack.t1059.006
  - attack.t1203
  - cve.2026-22807
author: RAXE Labs (M. Hirani)
date: 2026-03-09
logsource:
  category: process_creation
  product: linux
detection:
  selection_parent:
    ParentImage|contains:
      - 'python3'
      - 'python'
    ParentCommandLine|contains:
      - 'vllm'
      - 'vllm.entrypoints'
      - 'vllm serve'
      - '-m vllm'
  selection_suspicious_child:
    Image|endswith:
      - '/bash'
      - '/sh'
      - '/dash'
      - '/zsh'
      - '/curl'
      - '/wget'
      - '/nc'
      - '/ncat'
      - '/netcat'
    Image|not|contains:
      - 'ray'
      - 'vllm'
  filter_legitimate_workers:
    CommandLine|contains:
      - '--worker-use-ray'
      - 'ray::RayWorkerWrapper'
      - 'vllm.worker'
  condition: selection_parent and selection_suspicious_child and not filter_legitimate_workers
falsepositives:
  - Legitimate vLLM multi-GPU worker processes that match the parent filter
  - Debugging or monitoring scripts invoked alongside vLLM
  - Container health-check processes that happen to match parent criteria
level: high

Sigma Rule 2 -- vLLM Process Fetches Python Source from Hugging Face

Detects outbound HTTP(S) requests by a vLLM process to Hugging Face Hub endpoints that retrieve Python (.py) source files. In environments where trust_remote_code is not intentionally enabled, such retrieval is anomalous and may indicate CVE-2026-22807 exploitation; however, legitimate custom models that use trust_remote_code=True also trigger this pattern (GHSA-2pc9-4j83-qjmr).

title: vLLM Process Downloads Python Source File from HuggingFace Hub
id: raxe-2026-023-004
status: experimental
description: >
  Detects outbound HTTP(S) requests by a vLLM process to HuggingFace Hub
  endpoints that retrieve Python (.py) source files. In environments where
  trust_remote_code is not intentionally enabled, .py file retrieval during
  model loading is anomalous and may indicate CVE-2026-22807 exploitation.
  Note: legitimate models that use trust_remote_code=True (e.g., custom
  architectures) also fetch .py files — tune allowlists accordingly
  (GHSA-2pc9-4j83-qjmr).
references:
  - https://github.com/advisories/GHSA-2pc9-4j83-qjmr
  - https://nvd.nist.gov/vuln/detail/CVE-2026-22807
tags:
  - attack.execution
  - attack.t1059.006
  - attack.t1071.001
  - cve.2026-22807
author: RAXE Labs (M. Hirani)
date: 2026-03-09
logsource:
  category: proxy
  product: linux
detection:
  selection_process:
    ProcessName|contains:
      - 'python3'
      - 'python'
    CommandLine|contains:
      - 'vllm'
  selection_network:
    DestinationHostname|endswith:
      - 'huggingface.co'
      - 'hf.co'
    RequestPath|endswith:
      - '.py'
    RequestMethod: 'GET'
  condition: selection_process and selection_network
falsepositives:
  - Development environments where researchers intentionally load custom modules
    from HuggingFace using trust_remote_code=True (legitimate, but should still
    be reviewed)
  - Automated CI pipelines testing model loading with dynamic modules
level: high

Sigma Rule 3 -- vLLM Dynamic Module Loader Invocation

Detects log lines indicating that vLLM's model registry has invoked the Hugging Face dynamic module loader (dynamic_module_utils), the code path documented in CVE-2026-22807 (GHSA-2pc9-4j83-qjmr).

title: vLLM Dynamic Module Loader Invocation via registry.py
id: raxe-2026-023-005
status: experimental
description: >
  Detects log lines indicating that vLLM's model registry has invoked the
  HuggingFace dynamic module loader (dynamic_module_utils), which is the
  code path documented in CVE-2026-22807 (GHSA-2pc9-4j83-qjmr).
references:
  - https://github.com/advisories/GHSA-2pc9-4j83-qjmr
  - https://nvd.nist.gov/vuln/detail/CVE-2026-22807
tags:
  - attack.execution
  - attack.t1059.006
  - cve.2026-22807
author: RAXE Labs (M. Hirani)
date: 2026-03-09
logsource:
  category: application
  product: vllm
detection:
  selection_dynamic_module:
    EventData|contains:
      - 'dynamic_module_utils'
  selection_registry_path:
    EventData|contains:
      - 'registry.py'
      - 'try_get_class_from_dynamic_module'
  selection_auto_map:
    EventData|contains:
      - 'auto_map'
  condition: selection_dynamic_module or selection_registry_path or selection_auto_map
falsepositives:
  - Intentional use of trust_remote_code=True with reviewed custom model code
  - Debug log output from vLLM developers testing dynamic loading functionality
level: medium

Recommended deployment order: Rules 003 and 004 (process creation and network) first — lowest false-positive risk in production inference environments. Rules 001 and 002 (YARA file scan) during threat-hunting sweeps of model cache directories. Rule 005 (application log) after vLLM log pipeline is validated in the SIEM.


Detection & Mitigation

Immediate Actions

  1. Upgrade to vLLM 0.14.0 — This is the primary remediation. The patch gates dynamic module loading on trust_remote_code being explicitly enabled, restoring the intended security boundary (GHSA-2pc9-4j83-qjmr).

  2. Audit model sources — Review all model repositories loaded by vLLM instances. Identify any models sourced from public hubs, third-party repositories, or user-supplied paths. Inspect config.json files for auto_map entries that reference unexpected Python modules.

  3. Restrict model loading to trusted repositories — Until patched, implement an allowlist of approved model repositories. Do not load models from untrusted or unverified sources.

Detection Guidance

  • File-system scanning — Deploy YARA Rules 001 and 002 against Hugging Face cache directories (~/.cache/huggingface/hub/) and vLLM model staging paths to identify suspicious auto_map entries and co-located Python files with network/process-launch imports.
  • Process monitoring — Deploy Sigma Rule 003 on hosts running vLLM to detect unexpected child processes (shells, network tools) spawned during model initialisation. Requires EDR process creation events (Sysmon, auditd, or equivalent).
  • Network monitoring — Deploy Sigma Rule 004 to detect outbound HTTP GET requests for .py files from vLLM processes to Hugging Face Hub endpoints. Requires proxy or network flow log sources with process attribution.
  • Application log monitoring — Deploy Sigma Rule 005 to detect invocations of dynamic_module_utils or try_get_class_from_dynamic_module in vLLM application logs. Requires vLLM stdout/stderr captured in a SIEM.

Strategic Recommendations

  • Implement model integrity verification — Adopt model signature verification where supported. Verify checksums and provenance of model artefacts before loading into inference servers.
  • Apply least privilege — Run vLLM processes with minimal operating system privileges. Use containerisation and network segmentation to limit the blast radius of a compromised inference server.
  • Audit model-loading pipelines — Review automated CI/CD pipelines that fetch and deploy models. Ensure human approval gates exist for models sourced from external repositories, especially when auto_map entries are present.
  • Retain detection post-patch — Organisations that have upgraded to 0.14.0 should retain Sigma Rule 005 (log-based) to confirm that the vulnerable code path is no longer reachable in their deployment and to detect any attempts to load models with trust_remote_code=True.

Indicators of Compromise

Type Indicator Context
File config.json containing "auto_map" with dotted Python module path values Exploitation precondition: malicious model configuration (GHSA-2pc9-4j83-qjmr)
File .py files co-located with model weights in Hugging Face cache directories containing import subprocess, import socket, or import base64 Payload artefact: suspicious Python module in model repository (GHSA-2pc9-4j83-qjmr)
Behavioural Child processes (bash, sh, curl, wget, nc) spawned by Python process running vLLM during startup Post-exploitation: payload-spawned shell or network tool (GHSA-2pc9-4j83-qjmr)
Network Outbound HTTP GET for .py files from vLLM process to *.huggingface.co or *.hf.co Payload delivery: dynamic module fetch from remote repository (GHSA-2pc9-4j83-qjmr)
Process try_get_class_from_dynamic_module or dynamic_module_utils in vLLM application logs Vulnerable code path invocation (GHSA-2pc9-4j83-qjmr)

Strategic Context

The vLLM auto_map vulnerability illustrates a growing class of supply chain attacks against AI inference infrastructure. Several converging trends make this finding strategically significant for organisations deploying LLM-based systems.

The model artefact is an underappreciated attack surface. Traditional software supply chain security focuses on code dependencies (packages, libraries, containers). AI systems introduce a novel supply chain dimension: the model itself. Model configuration files, custom code modules, and serialised weights are all vectors for injecting malicious logic. CVE-2026-22807 demonstrates that even configuration-level fields (auto_map) can trigger arbitrary code execution when the loading framework does not enforce adequate trust boundaries (GHSA-2pc9-4j83-qjmr).

Security controls must survive integration. The trust_remote_code flag is a well-established security boundary in the Hugging Face Transformers library. The vulnerability arose because vLLM integrated Transformers' dynamic module loading functionality without consistently enforcing this boundary. This is a recurring pattern in AI framework security: security controls that work in isolation fail when frameworks compose or wrap third-party components without propagating trust decisions (RAXE assessment).

Inference infrastructure runs with elevated privileges. vLLM deployments typically require GPU access, large memory allocations, and access to model storage — privileges that amplify the impact of a successful compromise. Code execution on an inference server is not equivalent to code execution on a standard web application server; it grants access to potentially sensitive model weights, inference data, and high-value compute resources (RAXE assessment).

EPSS indicates low current exploitation, but risk may increase. The EPSS score of 0.056% (17th percentile) at the time of writing indicates low observed exploitation probability (FIRST.org EPSS). However, the simplicity of the attack mechanism (publish a model with a malicious auto_map entry) and vLLM's growing adoption as an inference engine may attract future exploitation, particularly if a public proof of concept is released (RAXE assessment).

Regulatory implications are emerging. Frameworks such as the EU AI Act increasingly require security assessments of AI system components. Supply chain vulnerabilities in inference engines — where a compromised model can lead to full server compromise — will fall under regulatory scrutiny as AI system security requirements mature (RAXE assessment).


References

  1. GHSA-2pc9-4j83-qjmr -- vLLM: RCE via auto_map dynamic module loading, CVSS 8.8 HIGH, CWE-94 (GHSA-2pc9-4j83-qjmr)
  2. CVE-2026-22807 -- NVD entry, NVD primary CVSS 9.8, GHSA CVSS 8.8, CWE-94, published 2026-01-21 (NVD)
  3. vllm-project/vllm#32194 -- Fix pull request (GHSA-2pc9-4j83-qjmr)
  4. Commit 78d13ea -- Fix commit (GHSA-2pc9-4j83-qjmr)
  5. vLLM v0.14.0 Release -- Patched release (GHSA-2pc9-4j83-qjmr)
  6. FIRST EPSS -- Exploit Prediction Scoring System data for CVE-2026-22807