RAXE-2026-042 HIGH S3

llama.cpp GGUF Integer Overflow Heap Buffer Overflow via Crafted Tensor Dimensions (CVE-2026-33298)

Supply Chain AML.T0010 2026-03-26 M. Hirani TLP:GREEN

1. Executive Summary

An integer overflow vulnerability in llama.cpp's ggml library (CVE-2026-33298) allows an attacker to craft a malicious GGUF model file that triggers a heap buffer overflow when loaded, potentially enabling arbitrary code execution (GHSA-96jg-mvhq-q7q7). The vulnerability is fixed in release b7824 (GHSA-96jg-mvhq-q7q7, NVD). Primary sources disagree on the lower bound of the affected range: NVD says "prior to b7824" while the GHSA advisory says affected versions are < b7437 with the patch in b7824. Organisations should upgrade to b7824 or later — the only version both sources agree is fixed. The flaw resides in the ggml_nbytes function where tensor dimension calculations can overflow int64_t values, causing the loader to allocate a drastically undersized memory buffer before writing tensor data into it (GHSA-96jg-mvhq-q7q7).

Why this matters: llama.cpp is a commonly deployed open-source LLM inference engine and the GGUF format is a popular format for distributing quantised model weights (RAXE assessment). Any application or service that loads GGUF files via the ggml library — including popular tools such as Ollama, LM Studio, and koboldcpp — is potentially affected (RAXE assessment). This vulnerability transforms the model file itself into an attack vector, targeting the supply chain at the model distribution layer.


2. Risk Rating

Dimension Rating Detail
Severity HIGH CVSS 7.8 (CNA-submitted via NVD)
Urgency HIGH Patch available since b7824; unpatched deployments at risk (GHSA-96jg-mvhq-q7q7)
Scope BROAD llama.cpp underpins a large proportion of local LLM deployments (RAXE assessment)
Confidence HIGH GHSA advisory with detailed technical mechanism; fix confirmed in release (GHSA-96jg-mvhq-q7q7)
Business Impact HIGH Code execution on systems loading untrusted models; supply chain compromise potential (RAXE assessment)

3. Affected Products

Product Affected Versions Fixed Version Status
llama.cpp (ggml library) Prior to b7824 (NVD); GHSA says < b7437 with patch in b7824. Sources disagree on the lower bound — see Version Range Discrepancy below. b7824 (both NVD and GHSA agree) Patch available

Am I Affected?

  • Check if you use llama.cpp or ggml: Search your dependencies for llama.cpp, ggml, or applications that load GGUF files
  • Check version: Run ./llama-cli --version or check the build tag — upgrade to b7824 or later, the only version both NVD and GHSA agree is fixed
  • Check deployment: If your application loads GGUF files from community sources (e.g., HuggingFace Hub), you are in the exposure surface for this vulnerability (RAXE assessment)

4. Abstract

CVE-2026-33298 is an integer overflow vulnerability in the ggml_nbytes function (ggml/src/ggml.c) used by llama.cpp to calculate tensor memory allocations during GGUF model loading (GHSA-96jg-mvhq-q7q7). The function computes byte sizes using (tensor->ne[i] - 1) * tensor->nb[i] on int64_t values without overflow protection (GHSA-96jg-mvhq-q7q7). An attacker can craft tensor dimensions — for example, ne = [1024, 1024, 2^42+1, 1] — that cause this multiplication to wrap to zero, resulting in a memory allocation many orders of magnitude smaller than required (GHSA-96jg-mvhq-q7q7). Subsequent writes of tensor data into this undersized buffer produce a heap buffer overflow (CWE-122) that can lead to arbitrary code execution (GHSA-96jg-mvhq-q7q7).

The vulnerability is classified as CWE-122 (Heap-based Buffer Overflow) and CWE-190 (Integer Overflow or Wraparound) (CNA, via NVD). The CNA-submitted CVSS score is 7.8 HIGH with vector CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H, reflecting a local attack vector requiring user interaction (loading the crafted file) but no privileges (NVD). Note: NVD status is Awaiting Analysis as of 2026-03-26; this score has not been independently verified by NVD analysts.


5. Vulnerability Details

Root Cause

The ggml_nbytes function in ggml/src/ggml.c calculates the total byte size of a tensor for memory allocation (GHSA-96jg-mvhq-q7q7). The calculation uses the expression:

nbytes += (tensor->ne[i] - 1) * tensor->nb[i]

where ne[i] is the element count per dimension and nb[i] is the byte stride per dimension, both int64_t (GHSA-96jg-mvhq-q7q7).

The GGUF loader validates that the element count does not exceed INT64_MAX, but it does not validate that the total byte size does not exceed SIZE_MAX (GHSA-96jg-mvhq-q7q7). When tensor dimensions are chosen such that the multiplication overflows int64_t, the result wraps to zero or a small value, and the subsequent malloc allocates a buffer that is many orders of magnitude too small (GHSA-96jg-mvhq-q7q7).

Exploitation Mechanism

  1. The attacker constructs a GGUF file with tensor metadata specifying dimensions that trigger integer overflow — e.g., ne = [1024, 1024, 2^42+1, 1] (GHSA-96jg-mvhq-q7q7)
  2. The element count validation passes because the product of dimensions fits within INT64_MAX (GHSA-96jg-mvhq-q7q7)
  3. The byte size calculation overflows, producing a small allocation (~4 MB instead of exabytes) (GHSA-96jg-mvhq-q7q7)
  4. The loader reads tensor data from the file into the undersized buffer
  5. A heap buffer overflow occurs, corrupting adjacent heap memory (GHSA-96jg-mvhq-q7q7)

6. Attack Flow

+------------------------------------+
| 1. Attacker delivers a crafted     |
|    GGUF model file via model hub,  |
|    direct transfer, or poisoned    |
|    repository.                     |
+-----------------+------------------+
                  |
                  v
+------------------------------------+
| 2. Victim application loads the    |
|    file with llama.cpp / ggml.     |
+-----------------+------------------+
                  |
                  v
+------------------------------------+
| 3. ggml_nbytes() overflows while   |
|    calculating tensor allocation   |
|    size from attacker dimensions.  |
+-----------------+------------------+
                  |
                  v
+------------------------------------+
| 4. Loader allocates a heap buffer  |
|    that is far smaller than the    |
|    tensor data requires.           |
+-----------------+------------------+
                  |
                  v
+------------------------------------+
| 5. Tensor bytes are copied into    |
|    the undersized buffer, causing  |
|    heap corruption.                |
+-----------------+------------------+
                  |
                  v
+------------------------------------+
| 6. Result: process crash at        |
|    minimum, and possible code      |
|    execution in the model-loading  |
|    context at worst.               |
+------------------------------------+

This flow is derived from the GHSA technical description of the integer overflow and subsequent heap overwrite, with the final impact bounded by the NVD/CNA severity assessment (GHSA-96jg-mvhq-q7q7, NVD).


7. Attack Requirements

Requirement Detail
Attack Vector Local (AV:L) — attacker must deliver a crafted GGUF file to the target system (NVD)
User Interaction Required (UI:R) — victim must initiate model loading (NVD)
Privileges Required None (PR:N) — no prior access to the target system needed (NVD)
Attack Complexity Low (AC:L) — no special conditions required beyond crafting the file (NVD)
Delivery Via model-sharing platforms, direct file transfer, or compromised model repositories (RAXE assessment)

8. Impact Assessment

  • Confidentiality: HIGH — arbitrary code execution enables data exfiltration (NVD)
  • Integrity: HIGH — arbitrary code execution enables system modification (NVD)
  • Availability: HIGH — process crash at minimum; persistent compromise at maximum (NVD)
  • Scope: Unchanged — exploitation occurs within the process loading the model (NVD)

The practical impact depends on the deployment context: - Desktop/consumer: RCE in the user's context; full access to local files and credentials (RAXE assessment) - Server/API: RCE in the service context; potential lateral movement if the model-serving process runs with elevated privileges (RAXE assessment) - Supply chain: A compromised model on a popular distribution platform could affect all users who download and load it (RAXE assessment)


9. Detection Methods

YARA — Suspicious GGUF File (Heuristic)

rule RAXE_2026_042_GGUF_Integer_Overflow_Tensor_Dims
{
    meta:
        description = "Detects GGUF files under 1 MB — heuristic for oversized tensor dimension claims in small files (CVE-2026-33298)"
        author = "RAXE Labs"
        date = "2026-03-26"
        reference = "https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7"
        cve = "CVE-2026-33298"
        severity = "HIGH"
        tlp = "GREEN"
    strings:
        $gguf_magic = { 47 47 55 46 }
    condition:
        $gguf_magic at 0 and filesize < 1048576
}

Note: This is a heuristic pre-filter. Legitimate quantised LLM models are typically several gigabytes; a GGUF file under 1 MB claiming to contain large tensors warrants manual inspection of the tensor dimension metadata in the file header (RAXE assessment). False positive rate: MEDIUM.

Sigma — llama.cpp Process Crash

title: llama.cpp Heap Corruption Crash Detection
id: raxe-2026-042-sigma-001
status: experimental
description: Detects abnormal termination of llama.cpp processes due to heap corruption (CVE-2026-33298)
author: RAXE Labs
date: 2026-03-26
references:
    - https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7
tags:
    - attack.execution
    - cve.2026.33298
logsource:
    category: process_termination
    product: linux
detection:
    selection_process:
        Image|endswith:
            - '/llama-server'
            - '/llama-cli'
            - '/ollama'
    selection_signal:
        ExitCode:
            - -6
            - -11
            - 134
            - 139
    condition: selection_process and selection_signal
falsepositives:
    - Legitimate out-of-memory conditions
    - Unrelated segmentation faults
level: medium

10. Mitigation Options

  1. Patch — Upgrade to llama.cpp release b7824 or later (GHSA-96jg-mvhq-q7q7)
  2. Audit dependencies — Identify all applications using llama.cpp or ggml for model loading and verify their version (RAXE assessment)
  3. Restrict model sources — Avoid loading GGUF files from untrusted sources on unpatched builds (RAXE assessment)
  4. Hash verification — Implement integrity checks for model files before loading where infrastructure permits (RAXE assessment)
  5. Sandboxing — Run model loading in sandboxed or containerised environments to limit the blast radius of potential exploitation (RAXE assessment)

11. Timeline

Date Event Source
2026-01-24 llama.cpp release b7824 (fix) published GitHub
2026-03-18 GHSA-96jg-mvhq-q7q7 advisory published GHSA
2026-03-24 CVE-2026-33298 published in NVD NVD
2026-03-26 RAXE-2026-042 signal detected and investigated RAXE Labs
2026-03-26 NVD status: Awaiting Analysis NVD

Version Range Discrepancy: NVD's description says the vulnerability affects versions "prior to b7824" (NVD). The GHSA advisory says affected versions are < b7437 with the patch in b7824 (GHSA-96jg-mvhq-q7q7). These are materially different claims: NVD implies every version before b7824 is vulnerable; GHSA implies only versions before b7437 are vulnerable, with b7824 as the confirmed fix. The status of versions b7437 through b7823 is unresolved in public advisory data. This draft recommends upgrading to b7824 or later — the only version both sources agree is fixed. The security fix in b7824 was available approximately two months before the CVE was published in NVD.


12. References

  1. (CVE) CVE-2026-33298 — NVD Entry
  2. (Advisory) GHSA-96jg-mvhq-q7q7 — llama.cpp Integer Overflow Advisory
  3. (Advisory) llama.cpp Release b7824 — Security Fix
  4. (ATLAS) AML.T0010 — AI Supply Chain Compromise

13. CVSS Scoring

  • CVSS v3.1: 7.8 HIGH (CNA-submitted; NVD status: Awaiting Analysis) (NVD)
  • Vector: CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H (NVD)
  • NVD Status: Awaiting Analysis as of 2026-03-26 — the CVSS score shown is CNA-submitted; NVD has not completed independent analysis (NVD)
Metric Value Meaning
Attack Vector Local (AV:L) Requires local file delivery (NVD)
Attack Complexity Low (AC:L) No special conditions (NVD)
Privileges Required None (PR:N) No prior access needed (NVD)
User Interaction Required (UI:R) Victim must load the file (NVD)
Scope Unchanged (S:U) Same process boundary (NVD)
Confidentiality High (C:H) Full read access possible (NVD)
Integrity High (I:H) Full write access possible (NVD)
Availability High (A:H) Process crash or persistent compromise (NVD)

14. Metadata

Field Value
Finding ID RAXE-2026-042
CVE CVE-2026-33298
GHSA GHSA-96jg-mvhq-q7q7
CWE CWE-122 (Heap-based Buffer Overflow), CWE-190 (Integer Overflow)
CVSS 7.8 HIGH (CNA-submitted via NVD)
ATLAS AML.T0010 — AI Supply Chain Compromise
Admiralty Grade B2 — Usually reliable source, probably true
TLP TLP:GREEN
Stream S3 — Supply Chain
Reporter alexanderkent (GHSA-96jg-mvhq-q7q7)