1. Executive Summary
An integer overflow vulnerability in llama.cpp's ggml library (CVE-2026-33298) allows an attacker to craft a malicious GGUF model file that triggers a heap buffer overflow when loaded, potentially enabling arbitrary code execution (GHSA-96jg-mvhq-q7q7). The vulnerability is fixed in release b7824 (GHSA-96jg-mvhq-q7q7, NVD). Primary sources disagree on the lower bound of the affected range: NVD says "prior to b7824" while the GHSA advisory says affected versions are < b7437 with the patch in b7824. Organisations should upgrade to b7824 or later — the only version both sources agree is fixed. The flaw resides in the ggml_nbytes function where tensor dimension calculations can overflow int64_t values, causing the loader to allocate a drastically undersized memory buffer before writing tensor data into it (GHSA-96jg-mvhq-q7q7).
Why this matters: llama.cpp is a commonly deployed open-source LLM inference engine and the GGUF format is a popular format for distributing quantised model weights (RAXE assessment). Any application or service that loads GGUF files via the ggml library — including popular tools such as Ollama, LM Studio, and koboldcpp — is potentially affected (RAXE assessment). This vulnerability transforms the model file itself into an attack vector, targeting the supply chain at the model distribution layer.
2. Risk Rating
| Dimension | Rating | Detail |
|---|---|---|
| Severity | HIGH | CVSS 7.8 (CNA-submitted via NVD) |
| Urgency | HIGH | Patch available since b7824; unpatched deployments at risk (GHSA-96jg-mvhq-q7q7) |
| Scope | BROAD | llama.cpp underpins a large proportion of local LLM deployments (RAXE assessment) |
| Confidence | HIGH | GHSA advisory with detailed technical mechanism; fix confirmed in release (GHSA-96jg-mvhq-q7q7) |
| Business Impact | HIGH | Code execution on systems loading untrusted models; supply chain compromise potential (RAXE assessment) |
3. Affected Products
| Product | Affected Versions | Fixed Version | Status |
|---|---|---|---|
| llama.cpp (ggml library) | Prior to b7824 (NVD); GHSA says < b7437 with patch in b7824. Sources disagree on the lower bound — see Version Range Discrepancy below. |
b7824 (both NVD and GHSA agree) | Patch available |
Am I Affected?
- Check if you use llama.cpp or ggml: Search your dependencies for
llama.cpp,ggml, or applications that load GGUF files - Check version: Run
./llama-cli --versionor check the build tag — upgrade to b7824 or later, the only version both NVD and GHSA agree is fixed - Check deployment: If your application loads GGUF files from community sources (e.g., HuggingFace Hub), you are in the exposure surface for this vulnerability (RAXE assessment)
4. Abstract
CVE-2026-33298 is an integer overflow vulnerability in the ggml_nbytes function (ggml/src/ggml.c) used by llama.cpp to calculate tensor memory allocations during GGUF model loading (GHSA-96jg-mvhq-q7q7). The function computes byte sizes using (tensor->ne[i] - 1) * tensor->nb[i] on int64_t values without overflow protection (GHSA-96jg-mvhq-q7q7). An attacker can craft tensor dimensions — for example, ne = [1024, 1024, 2^42+1, 1] — that cause this multiplication to wrap to zero, resulting in a memory allocation many orders of magnitude smaller than required (GHSA-96jg-mvhq-q7q7). Subsequent writes of tensor data into this undersized buffer produce a heap buffer overflow (CWE-122) that can lead to arbitrary code execution (GHSA-96jg-mvhq-q7q7).
The vulnerability is classified as CWE-122 (Heap-based Buffer Overflow) and CWE-190 (Integer Overflow or Wraparound) (CNA, via NVD). The CNA-submitted CVSS score is 7.8 HIGH with vector CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H, reflecting a local attack vector requiring user interaction (loading the crafted file) but no privileges (NVD). Note: NVD status is Awaiting Analysis as of 2026-03-26; this score has not been independently verified by NVD analysts.
5. Vulnerability Details
Root Cause
The ggml_nbytes function in ggml/src/ggml.c calculates the total byte size of a tensor for memory allocation (GHSA-96jg-mvhq-q7q7). The calculation uses the expression:
nbytes += (tensor->ne[i] - 1) * tensor->nb[i]
where ne[i] is the element count per dimension and nb[i] is the byte stride per dimension, both int64_t (GHSA-96jg-mvhq-q7q7).
The GGUF loader validates that the element count does not exceed INT64_MAX, but it does not validate that the total byte size does not exceed SIZE_MAX (GHSA-96jg-mvhq-q7q7). When tensor dimensions are chosen such that the multiplication overflows int64_t, the result wraps to zero or a small value, and the subsequent malloc allocates a buffer that is many orders of magnitude too small (GHSA-96jg-mvhq-q7q7).
Exploitation Mechanism
- The attacker constructs a GGUF file with tensor metadata specifying dimensions that trigger integer overflow — e.g.,
ne = [1024, 1024, 2^42+1, 1](GHSA-96jg-mvhq-q7q7) - The element count validation passes because the product of dimensions fits within INT64_MAX (GHSA-96jg-mvhq-q7q7)
- The byte size calculation overflows, producing a small allocation (~4 MB instead of exabytes) (GHSA-96jg-mvhq-q7q7)
- The loader reads tensor data from the file into the undersized buffer
- A heap buffer overflow occurs, corrupting adjacent heap memory (GHSA-96jg-mvhq-q7q7)
6. Attack Flow
+------------------------------------+
| 1. Attacker delivers a crafted |
| GGUF model file via model hub, |
| direct transfer, or poisoned |
| repository. |
+-----------------+------------------+
|
v
+------------------------------------+
| 2. Victim application loads the |
| file with llama.cpp / ggml. |
+-----------------+------------------+
|
v
+------------------------------------+
| 3. ggml_nbytes() overflows while |
| calculating tensor allocation |
| size from attacker dimensions. |
+-----------------+------------------+
|
v
+------------------------------------+
| 4. Loader allocates a heap buffer |
| that is far smaller than the |
| tensor data requires. |
+-----------------+------------------+
|
v
+------------------------------------+
| 5. Tensor bytes are copied into |
| the undersized buffer, causing |
| heap corruption. |
+-----------------+------------------+
|
v
+------------------------------------+
| 6. Result: process crash at |
| minimum, and possible code |
| execution in the model-loading |
| context at worst. |
+------------------------------------+
This flow is derived from the GHSA technical description of the integer overflow and subsequent heap overwrite, with the final impact bounded by the NVD/CNA severity assessment (GHSA-96jg-mvhq-q7q7, NVD).
7. Attack Requirements
| Requirement | Detail |
|---|---|
| Attack Vector | Local (AV:L) — attacker must deliver a crafted GGUF file to the target system (NVD) |
| User Interaction | Required (UI:R) — victim must initiate model loading (NVD) |
| Privileges Required | None (PR:N) — no prior access to the target system needed (NVD) |
| Attack Complexity | Low (AC:L) — no special conditions required beyond crafting the file (NVD) |
| Delivery | Via model-sharing platforms, direct file transfer, or compromised model repositories (RAXE assessment) |
8. Impact Assessment
- Confidentiality: HIGH — arbitrary code execution enables data exfiltration (NVD)
- Integrity: HIGH — arbitrary code execution enables system modification (NVD)
- Availability: HIGH — process crash at minimum; persistent compromise at maximum (NVD)
- Scope: Unchanged — exploitation occurs within the process loading the model (NVD)
The practical impact depends on the deployment context: - Desktop/consumer: RCE in the user's context; full access to local files and credentials (RAXE assessment) - Server/API: RCE in the service context; potential lateral movement if the model-serving process runs with elevated privileges (RAXE assessment) - Supply chain: A compromised model on a popular distribution platform could affect all users who download and load it (RAXE assessment)
9. Detection Methods
YARA — Suspicious GGUF File (Heuristic)
rule RAXE_2026_042_GGUF_Integer_Overflow_Tensor_Dims
{
meta:
description = "Detects GGUF files under 1 MB — heuristic for oversized tensor dimension claims in small files (CVE-2026-33298)"
author = "RAXE Labs"
date = "2026-03-26"
reference = "https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7"
cve = "CVE-2026-33298"
severity = "HIGH"
tlp = "GREEN"
strings:
$gguf_magic = { 47 47 55 46 }
condition:
$gguf_magic at 0 and filesize < 1048576
}
Note: This is a heuristic pre-filter. Legitimate quantised LLM models are typically several gigabytes; a GGUF file under 1 MB claiming to contain large tensors warrants manual inspection of the tensor dimension metadata in the file header (RAXE assessment). False positive rate: MEDIUM.
Sigma — llama.cpp Process Crash
title: llama.cpp Heap Corruption Crash Detection
id: raxe-2026-042-sigma-001
status: experimental
description: Detects abnormal termination of llama.cpp processes due to heap corruption (CVE-2026-33298)
author: RAXE Labs
date: 2026-03-26
references:
- https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7
tags:
- attack.execution
- cve.2026.33298
logsource:
category: process_termination
product: linux
detection:
selection_process:
Image|endswith:
- '/llama-server'
- '/llama-cli'
- '/ollama'
selection_signal:
ExitCode:
- -6
- -11
- 134
- 139
condition: selection_process and selection_signal
falsepositives:
- Legitimate out-of-memory conditions
- Unrelated segmentation faults
level: medium
10. Mitigation Options
- Patch — Upgrade to llama.cpp release b7824 or later (GHSA-96jg-mvhq-q7q7)
- Audit dependencies — Identify all applications using llama.cpp or ggml for model loading and verify their version (RAXE assessment)
- Restrict model sources — Avoid loading GGUF files from untrusted sources on unpatched builds (RAXE assessment)
- Hash verification — Implement integrity checks for model files before loading where infrastructure permits (RAXE assessment)
- Sandboxing — Run model loading in sandboxed or containerised environments to limit the blast radius of potential exploitation (RAXE assessment)
11. Timeline
| Date | Event | Source |
|---|---|---|
| 2026-01-24 | llama.cpp release b7824 (fix) published | GitHub |
| 2026-03-18 | GHSA-96jg-mvhq-q7q7 advisory published | GHSA |
| 2026-03-24 | CVE-2026-33298 published in NVD |
NVD |
| 2026-03-26 | RAXE-2026-042 signal detected and investigated |
RAXE Labs |
| 2026-03-26 | NVD status: Awaiting Analysis | NVD |
Version Range Discrepancy: NVD's description says the vulnerability affects versions "prior to b7824" (NVD). The GHSA advisory says affected versions are < b7437 with the patch in b7824 (GHSA-96jg-mvhq-q7q7). These are materially different claims: NVD implies every version before b7824 is vulnerable; GHSA implies only versions before b7437 are vulnerable, with b7824 as the confirmed fix. The status of versions b7437 through b7823 is unresolved in public advisory data. This draft recommends upgrading to b7824 or later — the only version both sources agree is fixed. The security fix in b7824 was available approximately two months before the CVE was published in NVD.
12. References
- (CVE) CVE-2026-33298 — NVD Entry
- (Advisory) GHSA-96jg-mvhq-q7q7 — llama.cpp Integer Overflow Advisory
- (Advisory) llama.cpp Release b7824 — Security Fix
- (ATLAS) AML.T0010 — AI Supply Chain Compromise
13. CVSS Scoring
- CVSS v3.1: 7.8 HIGH (CNA-submitted; NVD status: Awaiting Analysis) (NVD)
- Vector:
CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H(NVD) - NVD Status: Awaiting Analysis as of 2026-03-26 — the CVSS score shown is CNA-submitted; NVD has not completed independent analysis (NVD)
| Metric | Value | Meaning |
|---|---|---|
| Attack Vector | Local (AV:L) | Requires local file delivery (NVD) |
| Attack Complexity | Low (AC:L) | No special conditions (NVD) |
| Privileges Required | None (PR:N) | No prior access needed (NVD) |
| User Interaction | Required (UI:R) | Victim must load the file (NVD) |
| Scope | Unchanged (S:U) | Same process boundary (NVD) |
| Confidentiality | High (C:H) | Full read access possible (NVD) |
| Integrity | High (I:H) | Full write access possible (NVD) |
| Availability | High (A:H) | Process crash or persistent compromise (NVD) |
14. Metadata
| Field | Value |
|---|---|
| Finding ID | RAXE-2026-042 |
| CVE | CVE-2026-33298 |
| GHSA | GHSA-96jg-mvhq-q7q7 |
| CWE | CWE-122 (Heap-based Buffer Overflow), CWE-190 (Integer Overflow) |
| CVSS | 7.8 HIGH (CNA-submitted via NVD) |
| ATLAS | AML.T0010 — AI Supply Chain Compromise |
| Admiralty Grade | B2 — Usually reliable source, probably true |
| TLP | TLP:GREEN |
| Stream | S3 — Supply Chain |
| Reporter | alexanderkent (GHSA-96jg-mvhq-q7q7) |