RAXE-2026-016 HIGH S4

Web-Based Indirect Prompt Injection Against AI Agents: Observed in the Wild

Prompt Injection AML.T0051 2026-03-06 M. Hirani TLP:GREEN

Executive Summary

What: Unit 42 (Palo Alto Networks) has published what it describes as the first documented observation of indirect prompt injection (IDPI) attacks deployed against production AI agent systems in the wild [1]. The research catalogues 12 real-world case studies and identifies 22 distinct payload construction techniques used by adversaries to embed hidden instructions in web content consumed by AI agents. The earliest confirmed detection -- an attack designed to bypass an AI-based product advertisement review system -- was recorded in December 2025 [1].

So What: This research represents a qualitative shift in the IDPI threat landscape. Prior work by Greshake et al. (2023) demonstrated these attacks in controlled laboratory environments [2]. Unit 42's findings confirm that threat actors are now actively deploying IDPI techniques against production systems for commercial fraud, data destruction, unauthorised transactions, and sensitive data exfiltration. The attack class exploits fundamental design patterns in AI agent architectures rather than specific software vulnerabilities, meaning no CVE or patch exists for this attack class -- defence requires architectural controls.

Now What: Organisations deploying AI agents that consume external web content must implement input sanitisation and content inspection prior to LLM ingestion. Security teams should develop detection capabilities for hidden text injection patterns, deploy instruction-data separation architectures, and establish behavioural monitoring for anomalous AI agent actions. The 22 documented techniques provide a concrete detection engineering roadmap.


Risk Rating

Dimension Rating Detail
Severity High Observed attack outcomes include data destruction, unauthorised financial transactions, and sensitive data exfiltration [1]
Urgency High Attacks confirmed in production since December 2025; no patch available for this attack class as it targets AI design patterns rather than specific software [1]
Scope Broad Affects any AI agent system that processes external web content: ad review platforms, web scrapers, browser agents, hiring screeners, content moderation, search ranking [1]
Confidence High Based on Unit 42 production telemetry with 12 documented case studies and named indicator domains [1]
Business Impact High Direct financial loss (unauthorised purchases, forced donations), reputational damage (SEO poisoning, review manipulation), operational disruption (data destruction, denial of service) [1]

Affected Products

This finding does not target a specific software product or version. IDPI exploits architectural design patterns common to AI agent systems that consume external content. The following categories of AI-powered systems are affected:

System Category Attack Outcome Observed Example from Research
AI-based ad review systems Policy bypass, fraudulent ad approval Military glasses scam site bypassing automated review [1]
LLM-powered web scrapers Instruction hijacking, data exfiltration Hidden footer instructions to email company data [1]
Browser-based AI agents Unauthorised transactions, data destruction Forced subscription purchases via OAuth redirect [1]
Automated hiring screeners Recruitment decision manipulation Off-screen instructions to rate candidates as "extremely qualified" [1]
Content moderation systems Moderation bypass Suppression of negative reviews via hidden instructions [1]
Search engine ranking systems SEO poisoning Phishing site promotion through injected ranking instructions [1]

Am I Affected?

  • You are affected if your organisation deploys AI agents or LLM-powered tools that process external web content (browsing, scraping, summarisation, analysis)
  • You are affected if AI systems make automated decisions based on web-sourced content (ad approval, hiring, content moderation, purchasing)
  • You are affected if LLM-integrated browser extensions or autonomous agents operate on behalf of users
  • Check whether your AI agent pipeline includes pre-ingestion content inspection and instruction-data separation

Abstract

Indirect prompt injection (IDPI) has transitioned from a theoretical attack class to a confirmed operational threat. Research published by Unit 42 on 3 March 2026 documents 12 real-world case studies of IDPI attacks observed through Palo Alto Networks production telemetry, identifying 22 distinct techniques adversaries use to embed hidden instructions in web content consumed by AI agents [1]. Attack objectives range from low-severity nuisance (irrelevant output generation) to critical-severity outcomes including database destruction commands, denial of service via fork bombs, sensitive data exfiltration, and unauthorised financial transactions through payment processor redirects [1].

The research establishes a severity classification framework for IDPI attacks and provides statistical analysis of attacker intent distribution, delivery method prevalence, and jailbreak technique usage across Unit 42's observed telemetry corpus. Notably, social engineering framing dominates jailbreak methods at 85.2% of observations, whilst visible plaintext remains the leading delivery mechanism at 37.8% [1]. This publication analyses the Unit 42 findings, maps the attack techniques to the MITRE ATLAS framework (AML.T0051.001), provides detection signatures for the documented concealment methods, and recommends defensive architectures for AI agent deployments.


Key Findings

  1. IDPI is now an operational threat. Unit 42 documents what it describes as, to its knowledge, the first confirmed real-world IDPI attack against an AI-based product advertisement review system, detected in December 2025. The attack used 24 or more injection attempts with multiple concealment methods simultaneously to bypass the automated review of a fraudulent military glasses advertisement [1].

  2. 22 distinct payload construction techniques have been catalogued. These span five categories: visual concealment (CSS-based hiding), character manipulation (invisible Unicode, homoglyphs), HTML attribute cloaking, encoding obfuscation (Base64, JavaScript DOM injection), and plaintext embedding in low-attention page areas [1].

  3. Attack severity ranges from nuisance to critical. Observed outcomes include irrelevant output (28.6% of observations), data destruction commands including rm -rf --no-preserve-root and fork bombs (14.2%), content moderation bypass (9.5%), SEO poisoning for phishing site promotion, unauthorised Stripe and PayPal transactions, and sensitive data exfiltration [1].

  4. Social engineering dominates jailbreak methods. 85.2% of observed jailbreak techniques rely on social engineering framing -- authority override, "DAN" persona injection, and persuasive language -- rather than cryptographic or encoding-based evasion. This indicates attackers exploit model behavioural tendencies over technical weaknesses [1].

  5. Visible plaintext is the leading delivery method. 37.8% of observed IDPI payloads use visible plaintext placed in page footers or other low-attention areas. HTML attribute cloaking accounts for 19.8% and CSS rendering suppression for 16.9% [1].

  6. Multi-layered attacks are prevalent. 24.2% of observed attack pages contain multiple injection attempts, with the most sophisticated case (the ad review bypass) employing 24 or more distinct injection payloads combining visual concealment, obfuscation, dynamic execution, and semantic tricks simultaneously [1].

  7. No CVE or software patch addresses this attack class. IDPI exploits fundamental AI agent design patterns (processing untrusted external content as instructions) rather than specific software vulnerabilities. No single patch can remediate the underlying issue; mitigation requires architectural defences including instruction-data separation and pre-ingestion content inspection [1].


Attack Flow

                    INDIRECT PROMPT INJECTION KILL CHAIN
                    ====================================

    ADVERSARY                    WEB CONTENT                   AI AGENT
    ---------                    -----------                   --------

    +-------------------+
    | 1. PREPARATION    |
    | Select target     |
    | AI agent class    |
    | (ad review, web   |
    | scraper, browser  |
    | agent, etc.)      |
    +---------+---------+
              |
              v
    +-------------------+
    | 2. PAYLOAD        |
    | CONSTRUCTION      |
    |                   |
    | Choose from 22    |
    | techniques:       |
    | - CSS hiding      |
    | - Unicode tricks  |
    | - Base64 encode   |
    | - HTML cloaking   |
    | - Plaintext       |
    +---------+---------+
              |
              v
    +-------------------+         +-------------------+
    | 3. DEPLOYMENT     |-------->| 4. HOSTING        |
    | Embed payload     |         | Attacker page     |
    | in web page       |         | with hidden       |
    | (may use 24+      |         | instructions      |
    | injection points) |         | live on web       |
    +-------------------+         +---------+---------+
                                            |
                                            |  AI agent browses /
                                            |  scrapes / analyses
                                            |  the page
                                            |
                                            v
                                  +-------------------+
                                  | 5. INGESTION      |
                                  | AI agent parses   |
                                  | page content      |
                                  | including hidden  |
                                  | instructions      |
                                  +---------+---------+
                                            |
                                            |  Hidden instructions
                                            |  processed as input
                                            |  alongside legitimate
                                            |  content
                                            |
                                            v
                                  +-------------------+
                                  | 6. JAILBREAK      |
                                  | Social eng (85%)  |
                                  | JSON injection    |
                                  | Multilingual      |
                                  | Encoding tricks   |
                                  +---------+---------+
                                            |
                                            |  Model safety
                                            |  guardrails
                                            |  bypassed
                                            |
                                            v
                                  +-------------------+         +--------------------+
                                  | 7. EXECUTION      |-------->| 8. IMPACT          |
                                  | Agent follows     |         |                    |
                                  | injected          |         | - Ad fraud         |
                                  | instructions      |         | - Data theft       |
                                  | instead of        |         | - Unauthorised     |
                                  | legitimate task   |         |   transactions     |
                                  +-------------------+         | - Data destruction |
                                                                | - SEO poisoning    |
                                                                | - DoS              |
                                                                +--------------------+

Technical Details

7.1 Payload Delivery: Concealment Techniques

Unit 42 documents 22 distinct techniques for embedding hidden instructions in web content. These are grouped into the following categories [1]:

Visual Concealment (CSS-Based)

These techniques ensure that injected instructions are present in the DOM and processed by AI agents parsing the page, but are not rendered visually to human users:

  • Zero-sizing: Setting font-size: 0px and line-height: 0 renders text with no visual footprint whilst preserving it in the document object model
  • Off-screen positioning: Using position: absolute with extreme negative coordinates (e.g., left: -9999px) moves content outside the visible viewport
  • CSS rendering suppression: Applying display: none or visibility: hidden via CSS selectors hides content from visual rendering
  • Opacity manipulation: Setting opacity: 0 makes text fully transparent against any background
  • Colour camouflage: Matching text colour to background colour (e.g., white text on white background) conceals instructions from casual visual inspection

Obfuscation Techniques

  • XML/SVG CDATA encapsulation: Wrapping instructions in CDATA sections within SVG or XML elements embedded in HTML
  • HTML attribute cloaking: Placing instructions in data-* attributes, alt text, or title attributes that are not rendered as visible page content but are processed by HTML parsers
  • Textarea tag abuse: Embedding instructions within <textarea> elements that may be parsed by content extraction pipelines

Dynamic Execution

  • Base64-encoded runtime assembly: Encoding instructions in Base64 and decoding them via JavaScript at runtime, evading static content analysis
  • JavaScript-based DOM injection: Using JavaScript to inject instruction content into the DOM after initial page load, via methods such as createElement, appendChild, or innerHTML assignment
  • Canvas-based rendering: Rendering text onto HTML canvas elements, potentially evading text-based content extraction whilst remaining accessible to OCR-capable agents
  • Timed delays: Introducing delays before instruction injection to bypass time-bounded scanning mechanisms

URL Manipulation

  • Fragment injection (HashJack): Embedding instructions in URL fragments that are processed by client-side JavaScript but not sent to servers in HTTP requests

Plaintext Embedding

  • Footer placement: Placing instructions in page footers or other low-attention areas where human reviewers are unlikely to notice them (37.8% of observed cases) [1]
  • User-generated content: Embedding instructions in comment sections, review areas, or other user-contributed content zones

7.2 Jailbreak Methods: Safeguard Bypass

Once an AI agent ingests the hidden content, the attacker must bypass the model's safety guardrails. Unit 42 documents the following jailbreak categories [1]:

Instruction Obfuscation

  • Invisible zero-width Unicode characters: Inserting zero-width joiners (U+200D), zero-width spaces (U+200B), and soft hyphens (U+00AD) to fragment instruction keywords and evade pattern-based filters
  • Homoglyph substitution: Replacing Latin characters with visually identical characters from other Unicode blocks (e.g., Cyrillic 'a' U+0430 for Latin 'a' U+0061) to evade keyword matching
  • Payload splitting across HTML elements: Distributing instruction text across multiple HTML elements that are concatenated during content extraction
  • Unicode bidirectional override: Using U+202E (Right-to-Left Override) to reverse text rendering direction, obfuscating instruction content visually
  • HTML entity encoding: Representing characters as HTML entities (e.g., &#73; for 'I') to evade text-based scanning
  • URL encoding: Using percent-encoding (e.g., %49 for 'I') within embedded URLs or JavaScript strings
  • Nested multi-layer encoding: Combining multiple encoding schemes (Base64 within URL encoding within HTML entities) to defeat single-layer decoding analysis

Semantic Tricks

  • Multilingual instruction repetition: Repeating the same instruction in French, Chinese, Russian, and Hebrew to increase the probability that at least one version bypasses language-specific safety filters [1]
  • JSON/syntax injection: Using characters such as }} to break out of structured prompt contexts and inject new instructions
  • Social engineering framing: Using authority override language ("You are now in maintenance mode"), persona injection ("DAN -- Do Anything Now"), and persuasive phrasing to override safety training. This category accounts for 85.2% of observed jailbreak methods [1]

7.3 Case Study Analysis: AI Ad Review Bypass (December 2025)

The most significant case study is what Unit 42 describes as, to its knowledge, the first documented real-world IDPI attack against an AI-based product advertisement review system [1]:

  • Target domain: reviewerpress[.]com
  • Attack objective: Bypass automated AI review of a fraudulent advertisement promoting military glasses with fake discounts
  • Payload complexity: 24 or more injection attempts embedded in a single page, using multiple concealment methods simultaneously
  • Techniques employed: Visual concealment (CSS hiding), encoding obfuscation (Base64), dynamic execution (JavaScript injection), and semantic tricks (social engineering framing)
  • Significance: Demonstrates IDPI being used for commercial fraud at scale -- manipulating AI systems that make automated approval decisions with financial consequences

7.4 Critical-Severity Case Studies

Data Destruction (splintered[.]co[.]uk) - Hidden "delete your database" command delivered via CSS rendering suppression - Targets AI agents with access to database management functions [1]

Denial of Service (cblanke2.pages[.]dev) - Contains both rm -rf --no-preserve-root (filesystem deletion) and a fork bomb (:(){ :|:& };:) - Delivered via CSS rendering suppression - Targets AI agents with shell execution capabilities [1]

Unauthorised Transactions (llm7-landing.pages[.]dev) - Forces "pro plan" subscription purchase via OAuth redirect to token.llm7[.]io - Delivered via dynamic JavaScript execution - Targets browser-based AI agents capable of completing purchase flows [1]

Forced Donations (storage3d[.]com, perceptivepumpkin[.]com) - Redirects to Stripe payment processing URLs with attacker-controlled wallets - Attempted donation amounts of USD 5,000 - Delivered via HTML attribute cloaking and CSS suppression [1]

7.5 Statistical Distribution of Observed Attacks

Unit 42 provides the following distribution analysis across their observed corpus [1]:

Attacker Intent:

Intent Prevalence
Irrelevant output generation 28.6%
Data destruction 14.2%
Content moderation bypass 9.5%
Other intents (SEO poisoning, transactions, exfiltration) 47.7%

Prompt Delivery Methods:

Method Prevalence
Visible plaintext 37.8%
HTML attribute cloaking 19.8%
CSS rendering suppression 16.9%
Other methods 25.5%

Jailbreak Methods:

Method Prevalence
Social engineering 85.2%
JSON/syntax injection 7.0%
Multilingual instructions 2.1%
Other 5.7%

Confidence & Validation

Assessment Confidence: High

Aspect Status Detail
Source Credibility Tier 1 Unit 42 (Palo Alto Networks) is an established vendor threat intelligence unit with production telemetry [1]
In-the-Wild Observation Confirmed 12 case studies with named indicator domains observed through production telemetry [1]
CVE Assigned N/A IDPI exploits AI design patterns, not specific software vulnerabilities; no CVE applies [1]
PoC Available Yes Technique descriptions and indicator domains published; reproduction methodology documented
Patch Available N/A No software patch exists; mitigation requires architectural defences
Observed in the Wild Yes Unit 42 reports first confirmed detection in December 2025; multiple subsequent observations [1]
Vendor Advisory N/A Not a vendor-specific vulnerability; applies to AI agent design patterns broadly
Academic Precedent Yes Greshake et al. (2023) established theoretical foundation; Unit 42 confirms operational deployment [2]

Validation Notes

  • The 12 case studies include specific indicator domains that can be independently verified
  • The 22 payload construction techniques are individually reproducible in controlled environments
  • Statistical distributions (intent, delivery method, jailbreak method) are derived from Unit 42's telemetry corpus; exact corpus size is not disclosed
  • The research was authored by Beliz Kaleli, Shehroze Farooqi, Oleksii Starov, and Nabeel Mohamed of Unit 42 [1]

Detection Signatures (Formal Rules)

The following Sigma-format detection rules target the concealment techniques documented in the Unit 42 research. These rules are designed for web application firewalls, content inspection proxies, and AI agent input pipelines.

Rule 1: CSS-Based Hidden Text Injection

title: CSS-Based Hidden Text Injection in Web Content
id: raxe-sigma-016-001
status: experimental
description: >
  Detects CSS patterns commonly used to conceal prompt injection payloads
  in web content consumed by AI agents. Covers zero-sizing, off-screen
  positioning, opacity manipulation, and colour camouflage techniques
  documented by Unit 42.
references:
  - https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/
  - https://atlas.mitre.org/techniques/AML.T0051.001
author: RAXE Labs
date: 2026/03/06
tags:
  - attack.initial_access
  - atlas.aml.t0051.001
logsource:
  category: web_content_inspection
  product: ai_agent_pipeline
detection:
  selection_zero_size:
    content|contains:
      - 'font-size: 0'
      - 'font-size:0'
      - 'line-height: 0'
      - 'line-height:0'
      - 'width: 0'
      - 'height: 0'
  selection_offscreen:
    content|contains:
      - 'left: -9999'
      - 'left:-9999'
      - 'top: -9999'
      - 'top:-9999'
      - 'position: absolute'
  selection_invisible:
    content|contains:
      - 'opacity: 0'
      - 'opacity:0'
      - 'visibility: hidden'
      - 'display: none'
  condition: selection_zero_size or (selection_offscreen and selection_invisible)
falsepositives:
  - Legitimate CSS layouts using off-screen positioning for accessibility
    (screen reader content)
  - CSS transitions with temporary opacity:0 states
  - Responsive design patterns hiding elements on specific viewports
level: medium

Rule 2: Invisible Unicode Character Injection

title: Invisible Unicode Character Sequences in Web Content
id: raxe-sigma-016-002
status: experimental
description: >
  Detects concentrations of invisible Unicode characters (zero-width joiners,
  zero-width spaces, soft hyphens, bidirectional overrides) that may indicate
  prompt injection payload obfuscation. Based on jailbreak techniques
  documented by Unit 42.
references:
  - https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/
  - https://atlas.mitre.org/techniques/AML.T0051.001
author: RAXE Labs
date: 2026/03/06
tags:
  - attack.defense_evasion
  - atlas.aml.t0051.001
logsource:
  category: web_content_inspection
  product: ai_agent_pipeline
detection:
  selection_zwj:
    content|contains:
      - '\u200d'
      - '\u200b'
      - '\u200c'
      - '\u00ad'
      - '\ufeff'
  selection_bidi:
    content|contains:
      - '\u202e'
      - '\u202d'
      - '\u202a'
      - '\u202b'
  selection_homoglyph:
    content|re: '[\u0400-\u04ff].*[a-zA-Z]|[a-zA-Z].*[\u0400-\u04ff]'
  condition: selection_zwj or selection_bidi or selection_homoglyph
falsepositives:
  - Legitimate multilingual content mixing Latin and Cyrillic scripts
  - Arabic, Hebrew, or other RTL language content
  - Emoji sequences using zero-width joiners
level: medium

Rule 3: Base64-Encoded Dynamic Instruction Injection

title: Base64-Encoded Dynamic Instruction Injection via JavaScript
id: raxe-sigma-016-003
status: experimental
description: >
  Detects patterns indicative of Base64-encoded prompt injection payloads
  delivered via JavaScript runtime execution. Covers Base64 decoding
  combined with DOM manipulation patterns documented by Unit 42 as dynamic
  execution techniques for IDPI payload delivery.
references:
  - https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/
  - https://atlas.mitre.org/techniques/AML.T0051.001
author: RAXE Labs
date: 2026/03/06
tags:
  - attack.execution
  - atlas.aml.t0051.001
logsource:
  category: web_content_inspection
  product: ai_agent_pipeline
detection:
  selection_b64_decode:
    content|contains:
      - 'atob('
      - 'btoa('
      - 'base64,decode'
      - 'Buffer.from'
  selection_dom_inject:
    content|contains:
      - 'innerHTML'
      - 'insertAdjacentHTML'
      - 'createElement'
      - 'appendChild'
      - 'textContent'
  selection_data_attr:
    content|contains:
      - 'data-instruction'
      - 'data-prompt'
      - 'data-command'
      - 'data-system'
  condition: (selection_b64_decode and selection_dom_inject) or selection_data_attr
falsepositives:
  - Legitimate JavaScript applications using Base64 encoding for image data
  - Single-page applications with dynamic DOM manipulation
  - Web applications using data attributes for UI state management
level: medium

Detection & Mitigation

Pre-Ingestion Defences

Content Inspection and Sanitisation

Before external web content is passed to an AI agent for processing, organisations should implement a content inspection layer that:

  • Strips or flags CSS properties associated with text concealment (zero-sizing, off-screen positioning, opacity:0, colour camouflage)
  • Detects and normalises invisible Unicode characters, zero-width joiners, and bidirectional override characters
  • Decodes Base64-encoded content and inspects the decoded payload
  • Extracts and inspects content from HTML attributes (data-*, alt, title) separately from visible page text
  • Compares the visible rendered text against the full DOM text content to identify discrepancies that may indicate hidden instructions

Instruction-Data Separation

Unit 42 recommends the "spotlighting" technique, which establishes clear boundaries between trusted instructions (system prompts) and untrusted data (external web content) [1]. Implementation approaches include:

  • Delimited data regions: wrapping external content in explicit delimiters that the model is trained to treat as data, not instructions
  • Instruction hierarchy: establishing precedence rules where system-level instructions always override content-level instructions
  • Separate processing channels: routing external content through a data-only processing path that cannot execute instructions

Runtime Defences

Behavioural Monitoring

AI agent behaviour should be monitored for anomalous actions that may indicate successful prompt injection:

  • Unexpected network requests to external domains not in the agent's expected interaction set
  • Attempts to access payment processing URLs (Stripe, PayPal) outside normal workflows
  • Database modification commands (DELETE, DROP, TRUNCATE) triggered by web content analysis tasks
  • Email or messaging actions initiated during content summarisation or analysis tasks
  • Shell command execution attempts, particularly destructive commands (rm -rf, fork bombs)

Output Validation

  • Implement output classifiers that detect when an AI agent's response contains instruction-following patterns inconsistent with its assigned task
  • Cross-reference agent actions against expected behaviour profiles for each task type
  • Require human approval for high-impact actions (financial transactions, data modifications, external communications) regardless of the triggering context

Architectural Defences

Least-Privilege Agent Design

  • AI agents processing external web content should operate with minimal permissions
  • Separate content analysis capabilities from action execution capabilities
  • Implement approval workflows for actions with financial, data integrity, or communication consequences

Adversarial Training

  • Include IDPI examples in model fine-tuning and safety training datasets
  • Test AI agent deployments against the 22 documented payload construction techniques prior to production deployment
  • Conduct regular adversarial assessments using the case study patterns documented by Unit 42

Indicators of Compromise

Behavioural Indicators

Indicator Type Pattern Severity Notes
Unexpected external requests AI agent makes HTTP requests to domains outside its expected interaction set during content analysis High May indicate instruction hijacking
Payment redirect Agent navigates to Stripe, PayPal, or other payment processor URLs during non-purchasing tasks Critical Matches forced donation / unauthorised purchase patterns [1]
Destructive commands Agent generates or executes shell commands containing rm, del, DROP, DELETE, TRUNCATE Critical Matches data destruction case studies [1]
Data exfiltration Agent composes emails or API calls containing scraped/analysed content to unexpected recipients Critical Matches sensitive data leakage patterns [1]
Anomalous recommendations Hiring/review AI produces uniformly positive assessments inconsistent with input quality Medium Matches recruitment and review manipulation [1]
System prompt disclosure Agent output contains system prompt text or internal configuration details High Matches system prompt extraction attacks [1]
Fork/resource exhaustion Agent spawns recursive processes or generates excessive output volume Critical Matches DoS case studies [1]

Network Indicators (Domains from Unit 42 Research)

The following domains and payment URLs are artefacts from Unit 42's specific case studies, not universal indicators of IDPI activity. They are included for historical reference and to illustrate the types of infrastructure observed; defenders should focus on the behavioural indicators above for broader detection coverage.

The following domains were identified in Unit 42's case studies as hosting IDPI payloads [1]:

Domain Attack Type
reviewerpress[.]com AI ad review bypass
1winofficialsite[.]in SEO poisoning
splintered[.]co[.]uk Data destruction
llm7-landing.pages[.]dev Unauthorised purchase
cblanke2.pages[.]dev Denial of service (fork bomb)
storage3d[.]com Forced donation
perceptivepumpkin[.]com Forced donation
dylansparks[.]com Sensitive data leakage
trinca.tornidor[.]com Recruitment manipulation
turnedninja[.]com Irrelevant output
myshantispa[.]com Review manipulation
runners-daily-blog[.]com Unauthorised purchase

Payment Processing Indicators

Indicator Context
buy.stripe[.]com/7sY4gsbMKdZwfx39Sq0oM00 Forced donation endpoint [1]
buy.stripe[.]com/9B600jaQo3QC4rU3beg7e02 Forced donation endpoint [1]
paypal[.]me/shiftypumpkin Forced donation endpoint [1]
token.llm7[.]io/?subscription=show Forced subscription endpoint [1]

Strategic Context

The IDPI Inflection Point

The Unit 42 research marks a clear inflection point in the AI threat landscape. Indirect prompt injection has been a recognised theoretical risk since Greshake et al. published "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" in 2023 [2]. The OWASP LLM Top 10 (2025) lists prompt injection as the number one risk for LLM applications [3]. MITRE ATLAS catalogues the technique as AML.T0051 with the indirect sub-technique AML.T0051.001 [4, 5]. Despite this recognition, the security community has largely treated IDPI as a future risk requiring further research.

Unit 42's documentation of 12 in-the-wild case studies fundamentally changes this calculus. The threat has moved from "could happen" to "is happening." The December 2025 ad review bypass, in particular, demonstrates that adversaries are investing effort in sophisticated multi-technique attacks (24+ injection points per page) against production AI systems for commercial gain.

Implications for AI Agent Security

The rapid enterprise adoption of AI agents -- autonomous systems that browse the web, process documents, and take actions on behalf of users -- dramatically expands the attack surface for IDPI. Browser-based AI agents (such as those built on frameworks integrating LLMs with web browsing capabilities) are inherently exposed to adversary-controlled web content. As these agents gain capabilities (making purchases, sending emails, modifying data), the impact of successful prompt injection escalates from nuisance to financial and operational damage.

The Detection Gap

Within Unit 42's observed corpus, the 37.8% prevalence of visible plaintext delivery is striking: more than a third of observed attacks do not even attempt to hide the injection from human eyes. This suggests that current detection capabilities are so limited that attackers face minimal pressure to use sophisticated concealment. As detection matures, the distribution is likely to shift towards more advanced concealment techniques (CSS hiding, encoding obfuscation, dynamic execution), creating an ongoing adversary-defender arms race.

Regulatory and Compliance Considerations (RAXE Assessment)

The following regulatory analysis is RAXE Labs' own assessment, not a finding from the Unit 42 research. The EU AI Act, which entered into force in stages from 2024, establishes requirements for AI system robustness and security. IDPI attacks that manipulate AI decision-making (hiring, content moderation, financial approvals) may trigger regulatory scrutiny under provisions requiring AI systems to be resilient against attempts by unauthorised third parties to alter their use. Organisations deploying AI agents in regulated contexts should assess their IDPI exposure as part of conformity assessments.

Forward Outlook

The convergence of three trends -- enterprise AI agent adoption, adversary investment in IDPI techniques, and the absence of standardised defences -- creates a window of elevated risk. Detection engineering for IDPI techniques, input sanitisation architectures, and instruction-data separation frameworks represent the next frontier of AI security product development. The 22 techniques documented by Unit 42 provide a concrete starting point for detection rule development, but the technique space will expand as adversaries adapt.


References

  1. Kaleli, B., Farooqi, S., Starov, O., Mohamed, N. -- Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild. Unit 42, Palo Alto Networks. 3 March 2026.

  2. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M. -- Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. arXiv:2302.12173. 2023.

  3. OWASP -- OWASP Top 10 for Large Language Model Applications. 2025.

  4. MITRE -- ATLAS: AML.T0051 LLM Prompt Injection. MITRE ATLAS v5.4.0.

  5. MITRE -- ATLAS: AML.T0051.001 LLM Prompt Injection: Indirect. MITRE ATLAS v5.4.0.