TLP:WHITE | Threat Intelligence Report | Week 3, 2026

Protecting AI Agents in Production

Real-World Attack Patterns Targeting AI Systems

Analysis of 28,194 threat detections across 74,636 agent interactions in just 7 days reveals evolving attack patterns including the emergence of inter-agent attacks as a new threat category.

Data Period January 15-22, 2026
1 report available
Enterprise? Contact Us
Open
Community Driven
74K+ Interactions Protected
28K+ Threats Detected
0
Agent Interactions Analyzed
0
Threats Detected
0%
Detection Rate
0%
High-Confidence Detections
Explore the data
00

Executive Summary

Key insights for security leaders and practitioners

The Bottom Line

One in three AI agent interactions in our dataset contains adversarial content. During customer testing, teams intentionally probe with adversarial techniques — this ratio will normalise in production, but provides a strong indication of the attack surface organisations face. Our analysis of 74,636 interactions detected 28,194 threats with 92.8% high-confidence classification.

What's New This Week

  • Inter-agent attacks emerged as a distinct category (3.4%), with attackers now targeting agent-to-agent communication channels
  • RAG poisoning surged to 10% of all threats, exploiting document retrieval systems
  • Jailbreak detection confidence reached 96.3%, indicating these attack patterns are now highly predictable

Top 3 Threat Vectors

1
Data Exfiltration 19.2% of attacks
2
Jailbreak Attempts 12.3% of attacks
3
RAG/Context Poisoning 10.0% of attacks

Recommended Actions

  • Protect system prompts: 7.7% of attacks specifically target prompt extraction
  • Implement layered detection: combine pattern matching with ML classification
  • Audit agent permissions: tool abuse, and goal hijacking are increasing
  • Scan RAG documents: context poisoning is a growing attack vector
37.8% Detection Rate
74.8% Cybersecurity-Related
15.1% Target Agent Capabilities
<200ms P95 Detection Latency
01

Key Findings

Critical insights from 7 days of production threat detection

#2

Jailbreaks Show Clear Signatures

96.3% avg confidence

Well-established attack patterns enable reliable detection with highest confidence scores.

#3

Agent Attacks Are Growing

15.1% combined

Tool abuse, goal hijacking, and inter-agent attacks target agentic capabilities with 97%+ confidence.

#4

RAG Poisoning Emerges

10.0% 2,817 detections

Context injection and retrieval manipulation attacks targeting RAG systems.

#5

Cybersecurity Dominates Harm Categories

74.8% of all harm classifications

Malware generation, exploit development, and security bypass remain primary attacker objectives.

02

Threat Family Distribution

Click on any segment to explore detailed analysis

28,194
Total Threats

Select a threat family

Click on the chart to view details

Data Exfiltration

5,416 detections • 90.9% confidence
19.2%

Attempts to extract sensitive information from LLM systems, primarily targeting system prompts, training data hints, and user context.

Techniques Observed
  • System prompt extraction via direct questioning
  • Encoded extraction attempts (Base64, ROT13)
  • Context window manipulation
  • "Repeat your instructions" variants
Mitigation
  • Implement system prompt protection layers
  • Use prompt injection detection before processing
  • Monitor for repeated extraction attempts

Jailbreak

3,455 detections • 96.3% confidence
12.3%

Attempts to bypass safety guidelines and content policies through various manipulation techniques.

Techniques Observed
  • DAN (Do Anything Now) variants
  • Roleplay scenarios ("Act as a character who...")
  • Hypothetical framing ("In a fictional world...")
  • Multi-turn crescendo attacks
Mitigation
  • Implement multi-turn context analysis
  • Deploy escalation detection
  • Use confidence-based blocking (>95%)

RAG/Context Attack

2,817 detections • 93.4% confidence
10.0%

Attacks targeting Retrieval-Augmented Generation systems through document poisoning and context manipulation.

Techniques Observed
  • Document injection with hidden instructions
  • Context window overflow
  • Retrieval manipulation
  • Delimiter injection in retrieved content
Mitigation
  • Scan all documents before ingestion
  • Implement strict content sanitization
  • Use separate context windows for user input vs retrieved content

Prompt Injection

2,476 detections • 95.4% confidence
8.8%

Classic prompt injection attacks attempting to override system instructions or manipulate model behaviour.

Techniques Observed
  • Direct instruction override
  • Delimiter-based injection
  • Context confusion attacks
  • Nested instruction attacks
Mitigation
  • Input validation and sanitization
  • Instruction hierarchy enforcement
  • Clear delineation between system and user content

Tool/Command Abuse

2,287 detections • 86.5% confidence
8.1%

Attacks targeting LLM tool-calling capabilities to execute unintended actions.

Techniques Observed
  • Command injection in tool parameters
  • Tool chaining for privilege escalation
  • Parameter manipulation
  • Unintended tool invocation
Mitigation
  • Implement strict parameter validation
  • Use allowlists for tool capabilities
  • Monitor tool call sequences for anomalies

Encoding/Obfuscation

1,979 detections • 95.5% confidence
7.0%

Attacks using various encoding schemes to bypass detection.

Encoding Types Detected
  • Base64
  • ROT13
  • Unicode manipulation
  • Whitespace encoding
  • Homoglyph substitution
Mitigation
  • Decode all input variants before processing
  • Implement multi-layer encoding detection
  • Monitor for repeated encoding attempts
03

Attack Technique Frequency

Hover over bars for confidence scores and details

Rank Technique Count % of Total Confidence Risk
1 Instruction Override 2,727 9.7% 95.6% HIGH
2 Tool/Command Injection 2,322 8.2% 88.6% CRITICAL
3 RAG Poisoning 2,272 8.1% 93.3% HIGH
4 System Prompt Extraction 2,165 7.7% 96.7% HIGH
5 Role/Persona Manipulation 2,002 7.1% 90.8% MEDIUM
6 Encoding/Obfuscation 1,999 7.1% 93.9% HIGH
7 Indirect Injection 1,954 6.9% 94.8% HIGH
8 Tool Abuse 1,793 6.4% 88.8% HIGH
9 Chain-of-Thought Leak 1,634 5.8% 84.5% MEDIUM
04

Harm Category Analysis

What attackers are trying to achieve with LLM exploitation

Violence / Physical Harm 7.0% (1,968)
Increasing
Hate / Harassment 5.8% (1,626)
Stable
Privacy / PII 2.4% (689)
Stable
CBRN / Weapons 1.5% (412)
Stable
Sexual Content 1.4% (392)
Stable
Misinformation 0.6% (165)
Decreasing
Crime / Fraud 0.4% (107)
Stable
Self-Harm 0.3% (72)
Stable
05

Emerging Threats

New attack patterns targeting agentic AI systems

NEW CATEGORY
3.4%
960 detections

Inter-Agent Attacks

Attacks targeting multi-agent systems where one LLM communicates with another. Highest confidence scores (97.7%) indicate clear attack signatures.

Attack Patterns

  • Poisoned messages between agents
  • Agent impersonation
  • Recursive attack propagation
  • Trust exploitation between agents
Recommendation: Implement authentication and validation between agents.
3.6%
1,019 detections

Agent Goal Hijacking

Attacks attempting to redirect an autonomous agent's objectives through goal redefinition, priority manipulation, and constraint removal.

Detection Confidence: 97.3%
Recommendation: Implement immutable goal constraints and action logging.
5.8%
1,634 detections

Chain-of-Thought Manipulation

Attacks targeting the reasoning process of LLMs through reasoning injection, logic chain poisoning, and intermediate step manipulation.

Detection Confidence: 84.5% (lower due to subtlety)
Recommendation: Validate reasoning chains before action execution.
06

Recommendations

Actionable guidance based on threat intelligence

1

Implement Layered Defense

  • Combine pattern matching (L1) with ML classification (L2)
  • Use confidence thresholds for graduated responses
  • Log all detections for analysis
2

Protect System Prompts

  • Never echo system prompts to users
  • Implement extraction detection
  • Use instruction hierarchy
3

Secure Tool Integrations

  • Validate all tool parameters
  • Implement strict allowlists
  • Monitor tool call patterns
4

Handle Multi-Turn Contexts

  • Track escalation patterns
  • Analyse full conversation context
  • Implement session-level risk scoring
1

Monitor Emerging Patterns

  • Inter-agent attacks are growing
  • RAG poisoning is significant
  • Encoding attacks show sophistication
2

Implement Confidence-Based Policies

AUTO-BLOCK >95% confidence
FLAG FOR REVIEW 85-95% confidence
HUMAN REVIEW 70-85% confidence
3

Audit Agent Capabilities

  • Review tool permissions
  • Implement least-privilege
  • Log all agent actions
1

Assess LLM Deployment Risk

  • Inventory all LLM-powered applications
  • Identify data exposure risks
  • Implement runtime protection
2

Establish Detection Baselines

Security Testing 30-50% threat rate
Production 10-20% threat rate
Development 0-5% threat rate
3

Plan for Agentic AI

  • Agent attacks are increasing
  • Multi-agent systems need authentication
  • Goal hijacking is a real risk
07

Methodology

How RAXE detects and classifies threats

L1

Pattern-Based Detection

  • Deterministic rule matching
  • Sub-millisecond latency
  • 200+ threat patterns
L2

ML Classification

  • Gemma-based 5-head classifier
  • Voting ensemble with confidence
  • Family, technique, harm classification
93.9% Detection Confidence
96.5% HIGH_THREAT Precision
87.7% Model Consistency
2.7% Uncertain Predictions
08

Enterprise Intelligence Services

AI security consulting, threat intelligence, and agent runtime protection

This report is classified TLP:WHITE for unrestricted public distribution. Our enterprise practice delivers higher-classification intelligence products, security assessments, and consulting services tailored to your AI agent infrastructure and threat landscape.

TLP:WHITE This Report

Public Intelligence

Unlimited distribution

  • Weekly threat landscape reports
  • Attack technique trend analysis
  • OWASP AI Top 10 alignment mapping
  • Anonymised detection statistics
  • General mitigation frameworks
Audience Public, researchers, media
TLP:GREEN

Community Intelligence

Shareable within your sector and partner network

  • Sector-specific threat briefings (FinServ, Healthcare, Tech)
  • Detection signature library access
  • Emerging jailbreak and injection patterns
  • Shared IOC feeds for prompt attacks
  • Peer benchmarking and industry comparison
  • Monthly analyst briefings
Audience Customers, ISACs, partners
TLP:AMBER

Organisation Intelligence

Restricted to your organisation only

  • Custom threat modelling for your AI stack
  • Agent security architecture review
  • Multi-agent system risk assessment
  • RAG and tool chain security audit
  • Detection policy development
  • Weekly executive threat briefings
  • Red team exercise reports
Audience Your security team only
TLP:RED

Restricted Advisory

Named recipients only, verbal or secure channel

  • Incident response and active threat support
  • Zero-day vulnerability disclosure (pre-embargo)
  • Threat actor attribution and profiling
  • Agent compromise forensics
  • Dedicated analyst team (24/7 on-call)
  • Board-level strategic briefings
  • Custom wargaming and tabletop exercises
Audience CISO, executive leadership

Agent Security Assessment

Comprehensive threat model of your AI agent infrastructure, including tool chains, memory systems, and inter-agent communication.

Red Team Exercises

Adversarial testing using MITRE ATLAS techniques: prompt injection, jailbreaks, goal hijacking, privilege escalation, and data exfiltration.

Compliance Mapping

Accelerated path to ISO 42001, NIST AI RMF, and EU AI Act compliance with pre-built evidence and control documentation.

Runtime Protection

Managed detection and response for production AI agents. 514 detection rules, ML classification, and 24/7 monitoring.

Ready to secure your AI agents?

Our team specialises in LLM security, agentic AI protection, and multi-agent system defence. We bring deep expertise in prompt injection, jailbreak prevention, and agent runtime security.

enterprise@raxe.ai

Protect Your AI Applications

Deploy RAXE to detect and block these threats in real-time with <10ms latency.