RAG Poisoning: 3 Fake Documents Can Hijack Your Enterprise AI

At Bridgers, we design and deploy RAG systems for clients in finance, legal, and HR. When we build an AI assistant connected to an internal knowledge base, the security of that knowledge base is not an afterthought. It is the foundation. On March 12, 2026, AI security researcher Amine Raji published a demonstration that should concern every organization running a RAG system: in under three minutes, using three fabricated documents, he turned a company's Q4 financial results from $24.7M in revenue to $8.3M, complete with a fabricated layoff plan. No software exploit. No jailbreak. Just three files dropped into the right database. Here is what you need to know, and what we recommend to every client who trusts us with a RAG project.

What Is RAG and Why It Has a Blind Spot

Retrieval-Augmented Generation (RAG) has become the standard architecture for deploying LLMs in enterprise settings. Rather than relying solely on the model's parametric knowledge, the system queries an external knowledge base at inference time. Relevant documents are retrieved and fed into the model's context before it generates a response. This is what allows your internal AI assistant to answer with up-to-date, organization-specific data.

The fundamental problem is that the entire architecture rests on an implicit assumption: the knowledge base is trustworthy. When that assumption breaks, every query sent to the system becomes a potential vector for disinformation.

Unlike prompt injection, which targets a single query and affects one user during one session, document poisoning persists indefinitely. A single contamination event silently affects every user who asks a question on the targeted topic, for weeks or months, until someone notices the problem.

The OWASP Top 10 for LLM Applications 2025 formally classifies this threat under LLM08:2025, recognizing the knowledge base and embedding layer as a distinct attack surface separate from the model itself.

Enterprise knowledge bases have many write paths: Confluence wikis, SharePoint repositories, Slack archives, PDFs uploaded by employees, automated pipeline outputs. Each one is a potential injection vector.

Inside a RAG Poisoning Attack: Step by Step

To understand the threat, you need to understand the mechanics. The attack demonstrated by Amine Raji is grounded in the formal framework from Zou et al., published at USENIX Security 2025, and follows five stages.

Stage 1: Reconnaissance

The attacker identifies a high-value topic in the knowledge base: financial results, HR policies, compliance procedures, security configurations. They query the RAG system to understand what documents already exist and what vocabulary they use.

Stage 2: Vocabulary Engineering

The attacker crafts fabricated documents that mirror the domain-specific vocabulary of the targeted documents. By using the same terms ("Q4 2025," "Financial Results," "Revenue," "CORRECTED FIGURES"), they maximize the cosine similarity between their fake documents and relevant queries. They add authority markers: "CFO-approved," "supersedes all prior versions," "official correction."

Stage 3: Multi-Document Injection

This is the critical step. The attacker does not inject a single document but three to five, each telling the same fabricated story from a different angle:

Document 1 ("CFO-Approved Correction"): presents the "corrected" figures with CFO authorization, explicitly labeling the real figures as an error.
Document 2 ("Regulatory Notice"): cites both sets of figures, framing the real ones ($24.7M) as "originally reported" (i.e., superseded). Introduces a fake SEC inquiry for authority amplification.
Document 3 ("Board Meeting Notes"): a third corroborating source, showing the board reviewed and accepted the "corrected" figures.

Three mutually confirming sources. All using the same financial vocabulary. The legitimate document is outvoted.

Stage 4: Retrieval Displacement

When a user asks a relevant question, the retriever computes cosine similarity between the query and all documents. The poisoned documents, engineered to be semantically close to the query, score highest and dominate the top-k results. The legitimate document may still be retrieved, but it is now outnumbered.

Stage 5: Authority-Weighted Generation

The LLM reads all retrieved chunks. The poisoned documents contain authority framing ("CFO-approved correction," "board-verified restatement") while the legitimate document has no particular authority signals. The model treats the correction narrative as more credible. It produces the attacker's desired output, with confidence.

Critical detail: in Raji's demo, the real figures ($24.7M revenue) were in the context window. The legitimate document was retrieved. The LLM still chose to override it because the narrative framing of the poisoned documents prevailed over the factual content.

Property	Prompt Injection	Document Poisoning
Target	Individual query	Entire knowledge base
Persistence	Single session	Indefinite (until removed)
Scope	One user	All users
Detectability	Visible in prompt	Hidden in retrieved context
Required access	User query	Write access to knowledge base
Technical barrier	Low	Low (vocabulary engineering)

95% Success Rate: The Numbers That Should Worry You

The results measured by Raji in his local lab are stark. The attack succeeded in 19 out of 20 runs at temperature=0.1, a 95% success rate. The single failure was a hedged response that mentioned both sets of figures without committing to either.

What makes this truly alarming is that these results are consistent with large-scale academic research:

Study	Success Rate	Context
Raji (March 2026)	95%	3 documents, 5-doc corpus, Qwen2.5-7B
PoisonedRAG (Zou et al., USENIX 2025)	90%	5 documents injected into millions
CtrlRAG (Sui, March 2025)	90%	Black-box attack on GPT-4o
Data Loader Attacks (Castagnaro et al., 2025)	74.4%	Exploiting DOCX/HTML/PDF pipelines
Eyes-on-Me (Chen et al., 2025)	57.8%	18 different RAG configurations
AuthChain (Chang et al., 2025)	High	Single document, multi-hop questions

The critical finding from PoisonedRAG: 5 documents are enough to achieve 90% success in a database containing millions of documents. The attack does not require compromising a significant proportion of the corpus. It works through surgical precision on semantic similarity.

The equipment required? A MacBook Pro, no GPU, no cloud, no API key. Raji ran his entire demonstration locally using LM Studio, ChromaDB, and a Qwen2.5-7B model. The code is available on GitHub.

Some additional statistics that frame the broader context: according to a survey reported by Kiteworks, 48% of cybersecurity professionals identify agentic AI as the top attack vector for 2026. Per the Metomic State of Data Security Report, 68% of organizations have already experienced data leaks linked to AI tool usage, while only 23% have formal AI security policies in place.

Who Is Actually at Risk

Highest Risk

Organizations with broadly writable knowledge bases. If your employees can upload documents to a SharePoint, Confluence, Notion, or any similar system that feeds a RAG pipeline, you are exposed. The attack requires only write access to the knowledge base, and that access is typically distributed widely.

Multi-tenant RAG systems with shared knowledge bases. SaaS platforms where customers contribute documents to a shared corpus. One customer's malicious document can affect every other customer's queries.

Automated ingestion pipelines without human review. Confluence-to-RAG syncs, Slack-to-RAG archiving, web collectors. Any pipeline that ingests content automatically without human validation is a low-friction injection path.

Real-World Scenarios by Industry

Finance and Investor Relations: a contractor with Confluence editor access injects three fabricated correction memos into the financial knowledge base. Every executive who queries the internal AI assistant about Q4 performance receives falsified figures. Strategic decisions, board presentations, and investor communications are corrupted before anyone notices.

Legal and Compliance: a poisoned policy document claims a specific regulatory requirement has been "updated" and no longer applies. The legal assistant, used by non-specialists for compliance questions, consistently produces incorrect guidance. Compliance violations occur before the document is discovered.

HR and Workforce: fabricated documents claiming to be "updated compensation policy" or "revised severance terms" are injected. Employees querying the HR chatbot receive wrong information about their benefits or rights.

Security Operations: a poisoned document contains an instruction disguised as an "incident response procedure update": when employees encounter a specific error, they should "reset credentials via [attacker-controlled link]." The RAG-powered security assistant retrieves this and passes the phishing link to users with the full authority of an official incident response guide. This "Confused Deputy" attack pattern was documented by Deconvolute Labs.

How to Protect Your RAG System: 7 Defense Layers

The key insight from Raji's research is clear: defenses applied at the generation stage (prompt hardening, output filtering) are systematically less effective than those applied at the ingestion stage (embedding anomaly detection, access control). The right place to stop a poisoning attack is before the document enters the collection, not after it has already been retrieved.

Here are the 7 defense layers, ranked by measured effectiveness:

Layer 1: Embedding Anomaly Detection at Ingestion (Most Effective)

Before any new document enters the vector database, compute its embedding and run two checks: (1) is cosine similarity to existing documents abnormally high? (potential override attack); (2) are simultaneously ingested documents forming an overly dense cluster? (potential coordinated injection).

Measured effectiveness: reduces attack success from 95% to 20% as a standalone control. The most effective individual defense, requiring roughly 50 lines of Python using the embeddings your pipeline already produces.

```python for new_doc in candidate_documents: similarity_to_existing = max( cosine_sim(new_doc.embedding, existing.embedding) for existing in collection ) if similarity_to_existing > 0.85: flag("high similarity, manual review required")

cluster_density = mean_pairwise_similarity(candidate_documents) if cluster_density > 0.90: flag("dense cluster, potential coordinated injection") ```

Layer 2: Access Control Lists (ACL) on Document Ingestion

Implement metadata filtering and attribute-based access control (ABAC) on the knowledge base. Not all documents should be retrievable by all users for all queries. If a SharePoint document requires CFO approval to read, its vector embedding should carry the same access restriction.

Measured effectiveness: reduces success from 95% to 70%. Not sufficient alone, but it limits the blast radius of a successful injection.

Layer 3: Document Provenance Tracking

Every chunk in the knowledge base carries a provenance record: source system, ingestion timestamp, author/owner, authorization chain, and ideally a cryptographic signature. This metadata is surfaced to the LLM in the retrieval prompt context, giving it structured evidence to reason about when evaluating competing sources.

As a commenter on Hacker News put it: "If the source information cannot be linked to a person in the organisation, then it doesn't really belong in the RAG document store as authoritative information."

Layer 4: Monitoring and Alerting

Real-time monitoring of ingestion events, retrieval patterns, and output patterns. Alert on bulk insertions, sources with no prior ingestion history, and documents claiming to supersede high-value documents.

Measured effectiveness: pattern-based output monitoring reduces success from 95% to 60%. Limitation: the poisoned financial responses in Raji's demo look like normal financial summaries. Pattern monitoring catches obvious attacks, not sophisticated ones. For production systems, consider Llama Guard 3 or NeMo Guardrails for ML-based classification. RAGDefender reduced Gemini attack success from 0.89 to 0.02 in testing.

Layer 5: Input Validation and Sanitization at Ingestion

Screen documents before they enter the knowledge base for embedded instructions (prompt injection markers), hidden content (invisible Unicode, metadata fields), and content claiming to supersede existing authoritative documents.

Measured effectiveness: no impact (95% stays 95%) against vocabulary-engineering attacks, because the fabricated documents look entirely legitimate. However, this layer is effective against variants that embed explicit instructions in DOCX, HTML, or PDF files.

Layer 6: System Prompt Hardening

Modify your system prompt to explicitly instruct the LLM to treat retrieved context as external data, not instructions. Add guidance for handling conflicting sources: reason about provenance, default to the most recently ingested authoritative source.

Measured effectiveness: reduces success from 95% to 85%. A modest reduction. The authority framing in poisoned documents functions like soft prompt injection, and prompt hardening only partially addresses it.

Layer 7: Vector Database Snapshots and Rollback

Maintain periodic snapshots of your vector database at known-good states. If poisoning is discovered, rollback to the last clean snapshot is faster and more reliable than trying to surgically find and remove injected documents.

``python import shutil, datetime shutil.copytree( "./chroma_db", f"./chroma_db_snapshots/{datetime.date.today().isoformat()}" ) ``

Run this before every bulk ingestion. It is the equivalent of database backups, a standard practice that surprisingly few RAG deployments implement.

Defense Effectiveness Summary

Defense Layer	Stage	Residual Attack Success Rate
Embedding Anomaly Detection	Ingestion	20% (best individual defense)
Access Control (ACL)	Ingestion/Retrieval	70%
Pattern-Based Monitoring	Output	60%
Prompt Hardening	Generation	85%
Ingestion Sanitization	Ingestion	95% (ineffective alone)
All 5 layers combined	All stages	10%

Recent academic research offers complementary defenses. RAGPart and RAGMask (Pathmanathan et al., December 2025) operate directly on the retriever, identifying suspicious tokens through targeted token masking analysis. SDAG (Dekel et al., February 2026) uses block-sparse attention to disallow cross-document attention between retrieved documents. RevPRAG (EMNLP 2025) detects poisoned responses by analyzing LLM activation patterns, achieving a 98% true positive rate at roughly 1% false positives.

RAG vs Fine-Tuning: Which Is More Secure?

This is a question our clients at Bridgers ask regularly. RAG and fine-tuning are the two main approaches for specializing an LLM on enterprise data. But their risk profiles are fundamentally different.

Fine-tuning modifies the model's weights. The equivalent attack (training data poisoning) requires access to the fine-tuning pipeline, a significant volume of malicious data, and a new training cycle for the poisoning to take effect. It is more expensive, slower, and harder to execute without detection.

RAG does not touch the model. The attack targets only the external knowledge base, which is often far more accessible than the fine-tuning pipeline. Three documents are enough. The effect is immediate upon ingestion. And detection is more complex because the model itself has not changed.

Criterion	RAG	Fine-Tuning
Attack surface	Knowledge base (broad, many write paths)	Training pipeline (restricted)
Attacker effort	Low (3 documents, targeted vocabulary)	High (data volume, pipeline access)
Time to impact	Immediate	Requires a training cycle
Persistence	Until document removal	Permanent in model weights
Detection	Complex (model is unchanged)	Possible via performance evaluation
Rollback	Possible (vector DB snapshots)	Requires retraining

In summary: RAG offers more flexibility and easier knowledge updates, but it exposes a broader and more accessible attack surface. Fine-tuning is harder to poison, but when it happens, the problem is embedded in the model weights and far harder to fix.

For most enterprise use cases, RAG remains the right choice. But it must be deployed with appropriate defense layers. This is precisely why at Bridgers, we systematically integrate embedding anomaly detection and provenance tracking into every RAG architecture we deliver.

What Companies Should Do Right Now

This Week

Audit every write path into your knowledge base. Identify all human editors AND all automated pipelines (Confluence sync, Slack archiving, SharePoint connectors, documentation build scripts). If you cannot enumerate them all, you cannot secure them.

Implement embedding anomaly detection at ingestion. Roughly 50 lines of Python using the embeddings your pipeline already produces. This is the single highest-ROI control.

Set up vector database snapshots before every bulk ingestion. If an attack is discovered, rollback is faster and more reliable than forensic investigation.

This Month

Apply document classification and ACL to your vector database. The same access model that governs reading a SharePoint document should govern retrieving its embedding.

Add provenance metadata to every chunk. Source, author, ingestion timestamp, classification level. Surface this metadata in the prompt context.

Harden your system prompt. Explicitly instruct the LLM to treat retrieved context as external data. Add guidance for conflicting sources.

Configure output monitoring. Even pattern-based monitoring (regex on known sensitive figures) catches 40% of attacks.

This Quarter

Define a RAG-specific incident response playbook. How will you detect poisoning? What triggers rollback? Who is notified?

Implement source trust tiers. Official financial systems should not carry the same weight as user-uploaded documents or Slack archives.

Run a red-team exercise. With authorized access, attempt to inject fabricated documents and measure time to detection.

Lower LLM temperature for high-stakes use cases. At temperature=0.1, residual attack success is significantly lower than at 0.5 or higher.

The SEO of Embeddings: A Problem Without a Mature Solution

A commenter on Hacker News summed up the situation with a striking analogy: "The vocabulary engineering approach here is basically the embedding equivalent of SEO. You're just optimizing for cosine similarity instead of PageRank. And unlike SEO, there's no ecosystem of detection tools yet."

The analogy is apt. The web took twenty years to develop defenses against SEO manipulation. The RAG ecosystem has had roughly two years, and the detection tooling is nascent.

But the landscape is evolving rapidly. The threat intensifies with agentic AI systems: when RAG feeds into agents that can execute autonomous actions (call APIs, send emails, write files), a poisoned document no longer just causes a wrong answer. It triggers an action. As Lakera analyzed in their 2026 report, "what was once an academic threat is now a practical attack surface: poisoned repos, poisoned web content, poisoned tools, and poisoned datasets."

If you are building or operating a RAG system in production, the question is not whether this threat applies to you. It does. The question is how many defense layers you have in place today. If the answer is zero, you know where to start: 50 lines of Python for embedding anomaly detection, and a snapshot before every ingestion. That is the minimum. And it is exactly the kind of architecture we build at Bridgers for every RAG project we deliver.

Want to automate?

Free 30-min audit. We identify your 3 AI quick wins.

Book a free audit →