RAXE-2026-062: Ollama GGUF Heap Out-of-Bounds Read: Memory Disclosure via /api/create + Exfiltration via /api/push (CVE-2026-7482)

Executive Summary

RAXE Labs has independently verified CVE-2026-7482, an unauthenticated heap out-of-bounds read in Ollama's GGUF model loader for versions before 0.17.1, disclosed by the Echo CNA on 2026-05-04 and credited to the Cyera Research Team (Dor Attias, Ofek Itach). The Echo CNA filed two CVSS Secondary scores: CVSS v3.1 = 9.1 CRITICAL and CVSS v4.0 = 8.8 HIGH. NVD has not assigned its own Primary CVSS score; the scores visible on the NVD record are the Echo CNA's Secondary submissions. The vulnerability is reachable from the network on any deployment that exposes /api/create, including the documented OLLAMA_HOST=0.0.0.0 configuration. The architecturally novel element is the exfiltration chain: a second unauthenticated control endpoint, /api/push, lets the attacker ship the leaked memory back out as a published model artefact to a registry of their choosing. Operators should upgrade to Ollama 0.17.1 or later immediately and confirm /api/create and /api/push are not reachable from untrusted networks.

Risk Rating

Dimension	Rating	Detail
Severity	HIGH (`CVSS v4.0` = 8.8, CNA Secondary) / CRITICAL (`CVSS v3.1` = 9.1, CNA Secondary)	Both scores are CNA Secondary submissions from Echo; NVD has not assigned its own Primary CVSS score. RAXE uses the v4.0 figure as the headline per Publication Language Standards. See CVSS Divergence Note below.
Urgency	High	Public CVE since 2026-05-04, social-media discussion observed within days (see Confidence & Validation), fix shipped silently in v0.17.1.
Scope	Wide	Any Ollama deployment before 0.17.1 with network reach to `/api/create`. The default `127.0.0.1` binding is the safe path; `OLLAMA_HOST=0.0.0.0` (documented) inverts that.
Confidence	High (RAXE assessment)	Fix diff read in full; CVE record cross-validated against NVD and CVE.org; no working exploit produced; one-shot exfiltration chain reasoned from public diff and CNA description.
Business Impact	High	Echo CNA description identifies leaked content as including environment variables, API keys, system prompts, and concurrent users' conversation data.

Affected Products

Product	Affected Versions	Fixed Version	Status
Ollama (`github.com/ollama/ollama`)	All versions before 0.17.1 (NVD wording, no lower bound published)	0.17.1	Patched. Latest stable at publication time: v0.23.2.

Am I Affected?

Check if you run Ollama: ollama --version (CLI) or curl -s http://<host>:11434/api/version (HTTP).
Check exposure: confirm whether the Ollama HTTP listener is bound to 127.0.0.1 (default, safe) or 0.0.0.0 (documented, exposed). If you set OLLAMA_HOST in your unit file, Docker compose, or shell profile, audit the value.
Check upstream firewalling: if Ollama sits behind a reverse proxy, confirm /api/create and /api/push are not exposed to untrusted networks. Both endpoints are unauthenticated in the upstream distribution.
Plan upgrade path: any release at or after v0.17.1 carries the fix. The current latest stable is v0.23.2.

Abstract

CVE-2026-7482 is an unauthenticated heap out-of-bounds read in Ollama's GGUF model loader that, when paired with the unauthenticated /api/push endpoint, becomes a one-shot exfiltration chain. Echo and Cyera describe that chain in the original CVE record, and RAXE highlights it in this report. The fix landed silently in Ollama v0.17.1 with no security note in the release; the CVE was published by the Echo CNA on 2026-05-04. RAXE Labs has independently verified the CVE record and the upstream patch by reading PR #14406 and the published advisory. We have not produced a working exploit. The Echo CNA's Secondary scores are CVSS v4.0 = 8.8 HIGH and CVSS v3.1 = 9.1 CRITICAL; per RAXE convention we treat the v4.0 figure as the headline severity until NVD assigns its own Primary CVSS score. RAXE confidence in our reading of the fix is High.

Key Findings

Two unauthenticated control endpoints, one chain. /api/create triggers the OOB read on a malformed GGUF (uploaded first via the unauthenticated /api/blobs/sha256:<digest> endpoint), and /api/push ships the resulting poisoned model artefact to an attacker-named registry. All three endpoints are unauthenticated in the upstream distribution.
Silent fix. PR #14406 ("ggml: ensure tensor size is valid") was merged on 2026-02-25 by maintainer BruceMacD and shipped in v0.17.1. Release notes for v0.17.1 list six product changes and contain no security mention.
Exposure is configuration-shaped, not default. The documented OLLAMA_HOST=0.0.0.0 mode (used in Docker images, multi-host inference setups, and reverse-proxy fronting) is the path that converts a localhost-bounded bug into a network-reachable one. The Echo CNA description states "large public-internet exposure observed".
Leaked content is high-value. Per the Echo CNA description, candidate contents include environment variables, API keys, system prompts, and concurrent users' conversation data.
CVSS divergence is real and disclosed. Both v4.0 (8.8 HIGH) and v3.1 (9.1 CRITICAL) Secondary scores are filed by the CNA. NVD has not assigned its own Primary CVSS score. RAXE uses the v4.0 figure as the headline severity in this report.

Attack Flow

[ Attacker ]                                  [ Victim Ollama < 0.17.1 ]
     |                                                    |
     | 1. Craft GGUF whose header declares                |
     |    tensor_offset + tensor.Offset + Size()          |
     |    > actual file length                            |
     |                                                    |
     | 2. POST /api/blobs/sha256:<digest>  (no auth)      |
     |    Uploads the crafted GGUF as a blob              |
     |--------------------------------------------------->|
     |                                                    |
     | 3. POST /api/create  (no auth)                     |
     |    Modelfile references the uploaded blob          |
     |--------------------------------------------------->|
     |                                                    |
     |                                                    | 4. Decode() trusts header.
     |                                                    |    quantizer.WriteTo() reads
     |                                                    |    past the short backing
     |                                                    |    buffer; heap residue is
     |                                                    |    quantised into the output
     |                                                    |    model artefact.
     |                                                    |
     | 5. POST /api/push  (no auth)                       |
     |    target = attacker-controlled registry           |
     |--------------------------------------------------->|
     |                                                    |
     | 6. Pull from attacker registry, dequantise,        |
     |    recover heap bytes (env vars, API keys,         |
     |    system prompts, concurrent users' data).        |
     |                                                    |

Technical Details

Root cause

Pre-fix, (*gguf).Decode in fs/ggml/gguf.go parsed each tensor's Offset and Size() from the GGUF header without comparing the resulting end-of-tensor offset against the actual file length. A second pre-fix gap in (quantizer).WriteTo in server/quantization.go allowed the quantiser to proceed even when len(data) < q.from.Size(), causing the downstream conversion path to consume bytes past the backing buffer.

The patch (PR #14406, merge commit 9d902d63ce9e741c8c9f0b9716183905785e132e) adds two cooperating checks:

// fs/ggml/gguf.go (Decode), added in PR #14406
fileSize, _ := rs.Seek(0, io.SeekEnd)
...
for _, tensor := range llm.tensors {
    tensorEnd := llm.tensorOffset + tensor.Offset + tensor.Size()
    if tensorEnd > uint64(fileSize) {
        return fmt.Errorf("tensor %q offset+size (%d) exceeds file size (%d)", tensor.Name, tensorEnd, fileSize)
    }
    ...
}

// server/quantization.go (WriteTo), added in PR #14406
if uint64(len(data)) < q.from.Size() {
    return 0, fmt.Errorf("tensor %s data size %d is less than expected %d from shape %v",
        q.from.Name, len(data), q.from.Size(), q.from.Shape)
}

The maintainer added regression tests in both fs/ggml/gguf_test.go (truncated_tensor_data) and server/quantization_test.go (f32_short_data, f16_short_data) that exercise exactly this condition and assert that decoding now fails fast. RAXE has read the patch and the regression tests; we have not built a working exploit.

Exfiltration chain

The OOB read on its own would only leak bytes into an in-memory representation of the (re-)quantised model. The exfiltration step that makes this issue impactful in practice, and which the Echo CNA highlights in the original CVE record, is Ollama's /api/push endpoint, which lets a client publish a model artefact to a registry of the client's choosing. Because the leaked bytes have already been baked into the output tensor data by the time /api/push is called, and because /api/push is unauthenticated upstream, an attacker can reliably round-trip leaked memory off the host via two unauthenticated control endpoints (/api/create triggers the OOB read; /api/push exports the poisoned artefact). In the Cyera-published flow this is exercised as three API calls: the malformed GGUF is first uploaded as a blob via /api/blobs/sha256:<digest>, then referenced from /api/create, then exfiltrated via /api/push.

Why "before 0.17.1" is the right wording

RAXE confirmed via the GitHub compare API that the merge commit 9d902d63ce9e741c8c9f0b9716183905785e132e is reachable from the v0.17.1 tag and not from v0.17.0. The original head commit 88d57d0483cca907e0b23a968c83627a20b21047 (referenced by NVD) is the rebase-time HEAD of PR #14406 and is not itself reachable from any published release tag; the merge commit is. The NVD descriptor "before 0.17.1" is therefore correct as a user-facing version boundary.

CVSS Divergence Note (v4.0 vs v3.1)

The Echo CNA filed two CVSS scores as Secondary metrics:

Vector	Score	Severity	Notes
`CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:N/VA:H/SC:N/SI:N/SA:N` plus `AU:Y / R:A / V:D / RE:L / U:Red`	8.8	HIGH	Headline figure used in this report
`CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:H`	9.1	CRITICAL	Disclosed for completeness; v3.1 weights confidentiality-only impacts more aggressively in the unchanged-scope case

NVD has not assigned its own Primary CVSS score for CVE-2026-7482. Every score currently visible on the NVD record is a CNA Secondary submission from Echo, not an NVD Primary assessment. RAXE will revisit this report when NVD publishes its Primary score; until then we use the v4.0 figure.

Confidence & Validation

Assessment Confidence: High (RAXE assessment).

Aspect	Status	Detail
Vendor Advisory	Not published	The Ollama project's GitHub Security Advisories page lists no published advisory at draft time. The fix shipped via PR #14406 with no security framing.
CVE Assigned	Yes	`CVE-2026-7482` reserved 2026-04-30, published 2026-05-04 by the Echo CNA.
PoC Available	Conceptual only	RAXE has produced a conceptual PoC in `poc/CONCEPTUAL-POC.md` based on the public diff and the CNA description. No working exploit code is published. The maintainer's regression tests demonstrate the bug class on the patched branch.
Patch Available	Yes	Ollama v0.17.1 and later.
Exploited in Wild	Reported, not RAXE-confirmed	Multiple Twitter accounts (@TheHackersNews, @Dinosn, @Team_D4rkn3ttz) discussed the CVE within days of disclosure. The Echo CNA description states "large public-internet exposure observed". RAXE has not independently corroborated active exploitation.

Detection Signatures

The full Sigma rule cluster ships with this finding at detection/ollama-gguf-oob-cluster.yml. Four rules:

Ollama GGUF Malformed Tensor Bounds: application log signal. The post-fix error strings (offset+size (...) exceeds file size (...) and data size ... is less than expected ...) only appear on patched servers; their presence is direct evidence of a blocked exploit attempt.
Ollama HTTP /api/push to Unknown Registry: proxy/SIEM signal. Detects an HTTP POST to /api/push whose documented model field contains an explicit registry hostname that is not on the operator's allow-list. This is the rule that catches the network-tier exploitation chain when the destination registry is visible in request-body telemetry; it must be evaluated at the proxy or SIEM tier, not on the victim host's process telemetry.
Ollama CLI 'ollama push' to Unknown Registry: host-tier hunting telemetry. Companion to rule 2 for the case where an attacker has shell on the host and uses the local CLI rather than the HTTP API. Does not detect the network-tier chain; pair with rule 2, do not substitute.
Ollama Unauthenticated /api/create From External Source: proxy signal. Flags external POSTs to /api/create on deployments where Ollama is meant to be private. Operators who want mass-scan detection should add backend-native aggregation, for example more than 8 external /api/create requests from the same source in five minutes.

All four are status: experimental. They were derived from the public diff and the CNA description and have not been lab-validated against a live patched or unpatched Ollama. Operators should expect to tune thresholds and field names against real telemetry before promoting them.

Detection & Mitigation

Recommended actions, in priority order:

Upgrade Ollama to v0.17.1 or later. The latest stable at publication time is v0.23.2.
Audit OLLAMA_HOST configuration across Docker images, systemd unit files, Kubernetes manifests, and shell profiles. If 0.0.0.0 is set, ensure the listener is fronted by a proxy that authenticates inbound /api/create and /api/push calls.
Apply the Sigma rules in detection/ollama-gguf-oob-cluster.yml. The post-patch log-signature rule is the highest-confidence indicator; the push-to-unknown-registry rule is the highest-impact catch.
Constrain network egress from the Ollama host to your sanctioned registry list. The exfiltration chain depends on /api/push reaching an attacker-controlled registry; an egress-allow-list breaks the chain even if the OOB read is exploited.
Rotate any secrets that an Ollama process had access to between (a) the deployment of an exposed unpatched version and (b) the upgrade. Per the Echo CNA description, candidate leaked content includes environment variables and API keys.

Indicators of Compromise

Type	Indicator	Context
Ollama log line	`tensor "<name>" offset+size (<n>) exceeds file size (<m>)`	Post-patch indicator of a blocked GGUF malformation; from `fs/ggml/gguf.go`.
Ollama log line	`tensor <name> data size <n> is less than expected <m> from shape ...`	Post-patch indicator of a blocked quantiser short-read; from `server/quantization.go`.
HTTP request	`POST /api/create` from non-RFC1918 source IP at high rate	Mass-scan signal on internet-exposed Ollama.
Outbound connection	Ollama process to registry hostname not on the operator's allow-list	Exfiltration channel via `/api/push`.
Configuration	`OLLAMA_HOST=0.0.0.0` in a unit file, `docker run -p`, or shell environment, on a host before v0.17.1	Prerequisite for network reach to the unauthenticated endpoints.

Strategic Context

This finding sits in a pattern RAXE Labs has now seen across multiple AI-infrastructure CVEs in 2026: the inference-server tier ships wide trust surfaces by default, including unauthenticated control endpoints, model-loader code that trusts attacker-supplied headers, and model-publishing endpoints that do not gate destinations. CVE-2026-7482 is differentiated from generic memory-disclosure bugs by the fact that the same product also ships the matching exfiltration channel (/api/push) with the same trust posture (none).

For platform engineers building on top of Ollama or similar OSS inference stacks, the operational lesson is that authenticating the model-management plane is now table stakes. It is not optional, not a v2 feature. RAXE expects to see further bugs in this class as more independent researchers (Cyera here, others elsewhere) audit AI-inference servers built rapidly during the 2024-2026 model-deployment wave.

A secondary observation worth tracking: the silent-fix pattern, where a security-relevant patch lands with a one-line commit message and no release-note mention, leaves a measurable disclosure-gap window for defenders. RAXE recommends operators of OSS AI infrastructure subscribe to commit-level changelogs for their inference dependencies, not just release notes, until upstream practice catches up.

References

CVE-2026-7482 (NVD): https://nvd.nist.gov/vuln/detail/CVE-2026-7482
CVE-2026-7482 (CVE.org, CNA: Echo): https://www.cve.org/CVERecord?id=CVE-2026-7482
Ollama PR #14406 ("ggml: ensure tensor size is valid"): https://github.com/ollama/ollama/pull/14406
Ollama fix commit (head of PR #14406): https://github.com/ollama/ollama/commit/88d57d0483cca907e0b23a968c83627a20b21047
Ollama v0.17.1 release (silent fix, no security note): https://github.com/ollama/ollama/releases/tag/v0.17.1
MITRE ATLAS technique AML.T0010 (AI Supply Chain Compromise): https://atlas.mitre.org/techniques/AML.T0010
CWE-125 (Out-of-bounds Read): https://cwe.mitre.org/data/definitions/125.html
CAPEC-540 (Overread Buffers): https://capec.mitre.org/data/definitions/540.html