AI SecurityLLM Security

Your Resume Is Also a Prompt: Why Prompt Injection Is the Defining Security Problem in Real-World LLM Systems

The most dangerous misunderstanding in enterprise AI is treating documents as data when LLMs treat them as instructions. This post examines the research on resume-based prompt injection, connects it to broader attack surfaces, and argues that interface design is now security design.

14 min readApril 6, 2026

Prompt InjectionLLM SecurityOWASPNISTIndirect Prompt InjectionAI AgentsResume Parsing

The architectural truth

Any time an LLM reads untrusted content, the boundary between "content" and "control" starts to blur

A 2025 RecSysHR paper studied resumes containing hidden adversarial instructions designed to make an LLM overrate a candidate. The authors report seeing real examples in which job seekers hid manipulative text in very small white font, and they evaluated defenses across 1,200 experiments, spanning 10 injection strings, 5 models, and 24 prompting and defense setups. That paper is nominally about hiring. It is not really about hiring. It is about a much bigger architectural truth: when an LLM reads untrusted content, the channel that carries data and the channel that carries instructions become the same channel. NIST explicitly describes this in retrieval systems, noting that LLM use has "blurred the data and instruction channels," which enables indirect prompt injection attacks. This is the core issue. Not resumes. Not HR. The issue is that language has become both the input medium and the control surface.

Real attacks

Resume-based injection

Real examples exist of job seekers hiding adversarial instructions in tiny white font on resumes. When an LLM parses the document, it processes both the visible qualifications and the hidden instructions.

Standards

NIST framing

NIST AI 100-2e2025 explicitly notes that LLM use in retrieval has "blurred the data and instruction channels," enabling indirect prompt injection. This is an architectural property, not a bug.

Top risk

OWASP LLM01

Prompt injection sits at the top of the OWASP GenAI risk list. The attack involves crafted inputs that manipulate model behavior, bypass safeguards, or trigger unintended actions.

1,200

Experiments

0.8% → 52%

Jailbreak range

Blurred channels

Core vulnerability

Evolving threat landscape

Prompt injection is no longer a niche topic for red-teamers

Microsoft frames indirect prompt injection in practical enterprise terms: it happens when an LLM processes untrusted data and mistakes attacker-controlled content for instructions. Their July 2025 guidance describes this as a growing problem in enterprise workflows and emphasizes layered defenses, including input isolation, detection, and impact mitigation. OpenAI has made a similar point: prompt injection is a "frontier security challenge," and in agentic systems the goal is not merely to detect every malicious input, but to constrain the impact even when manipulation attempts succeed. This shift in framing is important. For a long time, people discussed prompt injection as if it were mostly a prompting problem. It is not. It is a systems problem.

Enterprise

Microsoft guidance

Microsoft's layered defense model includes hardened prompts, provenance separation, prompt-injection detection, governance controls, and deterministic blocking of exfiltration paths.

Agent design

OpenAI framing

OpenAI emphasizes constraining risky actions, protecting sensitive data, and designing systems so that successful manipulation does not automatically translate into damaging outcomes.

Evolution

Field maturation

The industry is moving from "How do I write a better system prompt?" to "How do I build a system where the model has fewer opportunities to do the wrong thing?"

Microsoft defense layers

Impact constraint

OpenAI priority

Empirical findings

Model robustness under adversarial conditions varies dramatically

The RecSysHR paper measured jailbreak success rates across models with the same mitigation strategies applied. Averaged across defenses, the results were striking: o4-mini at 0.8%, gpt-4.1 at 9.6%, o3-mini at 20.8%, gpt-4.1-nano at 48.7%, and gpt-4.1-mini at 52.1%. This matters because many enterprise teams still evaluate models mostly on quality, latency, and price. They should also be evaluating security behavior under adversarial conditions. The paper also found that defense design can radically change outcomes. With the best-performing mitigation, gpt-4.1-mini dropped from 52.1% jailbreak success to 0.0%. By contrast, several no-guardrail strategies reached 54.0% jailbreak success. That is the difference between "unsafe by default" and "possibly viable with careful architecture."

Most robust

o4-mini: 0.8%

The most robust model tested. Even under adversarial conditions with averaged mitigations, jailbreak success remained below 1%.

Defense impact

gpt-4.1-mini: 52.1% → 0%

Without proper defenses, highly vulnerable. With the best mitigation (untrusted-tag + jailbreak-detection guardrail), dropped to 0% jailbreak success.

Baseline

No-guardrail: 54%

Weak configurations still failed badly. Models can return valid JSON and still be wrong for adversarial reasons. Schema compliance is not semantic integrity.

0.8%

Best model (avg)

52.1%

Worst model (avg)

0.0%

Best defense result

\text{Jailbreak Success Rate} = \frac{\text{Successful Manipulations}}{\text{Total Attempts}} \times 100\%

The metric used in RecSysHR 2025 to measure model vulnerability. Success means the model produced the attacker-desired output.

Defensive architecture

Make trust boundaries explicit: the Spotlighting principle

One of the strongest results in the HR paper came from a setup that did three simple things: put the untrusted content in its own message boundary, labeled it explicitly as untrusted, and added a jailbreak-detection guardrail instructing the model how to interpret it. The best-performing strategy was the user-with-jailbreak-detection-guardrail-and-untrusted-tag-close configuration, achieving a 4.0% average jailbreak success rate. This aligns strongly with Microsoft Research's Spotlighting work. That paper argues that indirect prompt injection becomes possible because multiple input sources get flattened into one text stream, and it proposes provenance-signaling transformations so the model can distinguish trusted from untrusted sources. In their experiments, Spotlighting reduced attack success from above 50% to below 2% with minimal task impact. Different paper, same direction: make trust boundaries explicit.

Separation

Message boundary

Put untrusted content in its own message, clearly delimited from system instructions and trusted context. Prevents the "flattening" that enables injection.

Labeling

Untrusted tag

Explicitly label external content as untrusted. The HR paper showed this, combined with guardrails, achieved the best defensive performance.

Microsoft

Spotlighting

Microsoft Research technique: provenance-signaling transformations so the model can better distinguish sources. Reduced attack success from >50% to <2%.

4.0%

Best HR config

<2%

Spotlighting result

Beyond resumes

This is not only an HR problem — the same failure mode appears everywhere

The resume example is just the most legible one. The same underlying failure mode appears anywhere an LLM reasons over untrusted material: support tickets, emails, PDFs, CRM notes, web pages, RAG chunks, tool outputs, browser environments, multimodal interfaces. NIST's taxonomy discusses indirect prompt injection in RAG and notes that such attacks can lead to availability violations, integrity violations, privacy compromise, and abuse violations. The broader pattern also shows up in agent benchmarks. InjecAgent introduced 1,054 test cases spanning 17 user tools and 62 attacker tools, and reported that tool-integrated LLM agents were meaningfully vulnerable, with ReAct-prompted GPT-4 compromised 24% of the time. AgentDojo pushes this further with 97 realistic tasks and 629 security test cases in a dynamic environment. VPI-Bench extends the concern into computer-use and browser agents, with 306 test cases showing that malicious visual instructions can manipulate agent behavior.

Benchmark

InjecAgent

1,054 test cases spanning 17 user tools and 62 attacker tools. GPT-4 with ReAct prompting was compromised 24% of the time in their evaluation.

Dynamic eval

AgentDojo

97 realistic tasks and 629 security test cases in a dynamic environment. Treats prompt injection as an evolving interaction problem inside workflows.

Multimodal

VPI-Bench

306 test cases across five platforms showing malicious visual instructions embedded in interfaces can manipulate computer-use agent behavior.

1,054

InjecAgent cases

AgentDojo tasks

306

VPI-Bench cases

Implementation lessons

Structured output is helpful. It is not sufficient.

One of the most important implementation lessons in the HR paper is that the tested application already used structured output to force the model into a machine-readable mapping from criteria to scores. It also experimented with different message placements, including tool-message patterns. Yet weak configurations still failed badly. This aligns with what many teams see in production: a model can return valid JSON and still be wrong for adversarial reasons. The parser may be happy. Your security team should not be. The distinction that matters is this: schema compliance is not semantic integrity. A model that returns well-formed JSON with manipulated scores is not giving you safety — it is giving you false confidence.

Core lesson

Schema ≠ Truth

Valid JSON output does not mean the reasoning was not manipulated. The HR paper showed models returning properly structured ratings that were adversarially inflated.

Limitation

Tool messages

Even with tool-message patterns for structured output, weak configurations reached 54% jailbreak success. Message placement alone is not enough.

Goal

Semantic integrity

What you need is not just format compliance but reasoning integrity — ensuring the model's judgment was not hijacked by injected instructions.

Architectural control

The industry is moving toward system-level defenses

Another reason this area deserves deeper attention is that the research is moving beyond "better prompts" and toward architectural control. A 2024 paper on f-secure LLM systems argues for a system-level defense grounded in information flow control. The idea is to separate planning and execution, filter untrusted input before it can influence high-trust planning steps, and reason about security properties at the system level rather than inside a single giant prompt. OpenAI's recent guidance on agent design makes a similar move: constrain risky actions, protect sensitive data, and design systems so that successful manipulation attempts do not automatically translate into damaging outcomes. Microsoft's enterprise guidance reflects this shift with five defense layers. This is where the field is maturing: from "How do I write a better system prompt?" to "How do I build a system where the model has fewer opportunities to do the wrong thing?"

Architecture

Information flow control

Separate planning and execution. Filter untrusted input before it influences high-trust planning steps. Reason about security at the system level, not inside one prompt.

Constraints

Action constraints

OpenAI guidance: constrain what agents can do, protect sensitive data, and ensure successful injection does not automatically cause damage.

Blocking

Deterministic blocking

Microsoft's defense includes blocking certain exfiltration paths deterministically, so even a compromised model cannot leak data through specific channels.

Checklist

Separate planning and execution layers in agent architectures.
Filter untrusted input before it reaches high-trust planning steps.
Apply deterministic blocking for high-risk exfiltration paths.
Design systems where successful injection does not automatically cause damage.
Reason about security properties at the system level, not just prompt level.

The deeper truth

Interface design is now security design

When I look at LLM failures in the wild, many of them are really failures of boundary design. Who gets to issue instructions? Which channels are trusted? What happens when content and commands look identical? Can retrieved text influence planning? Can tool output influence authorization? Can the model turn untrusted content into actions? These are not cosmetic UX questions anymore. They are security questions. The best LLM systems in the next few years will not just be the ones that reason well. They will be the ones that know who is allowed to influence that reasoning, and how much.

Boundaries

Authority boundaries

Clearly define who gets to issue instructions. System policy, developer instructions, retrieved content, user uploads, and tool outputs should have different authority levels.

Channels

Channel trust

Not all input channels are equal. Treat external content as potentially adversarial by default, regardless of source reputation.

Authorization

Action authorization

The critical question: can the model turn untrusted content into actions? If yes, that is your vulnerability surface.

Who influences reasoning?

Security question

What to do now

Seven recommendations for reviewing enterprise LLM workflows

If I were reviewing an enterprise LLM workflow today, these would be the first questions I would ask. First, treat all external content as potentially adversarial — not just suspicious content, but all content. Second, separate channels by trust level so system policy, developer instructions, retrieved content, user uploads, and tool outputs are not blended into one flat context. Third, mark untrusted content explicitly with tags or provenance signals. Fourth, benchmark for robustness, not just task quality — if one model is at 0.8% and another is at 52.1% under attack, cost-per-token is not the only number that matters. Fifth, assume structured outputs are format controls, not truth controls. Sixth, add system-level constraints around actions and data access. Seventh, evaluate with attack suites like InjecAgent, AgentDojo, and VPI-Bench, not only happy-path evaluations.

Both matter

Cost vs. security

Attack suites

Evaluation shift

Checklist

Treat all external content as potentially adversarial.
Separate channels by trust level — do not blend system policy with user uploads.
Mark untrusted content explicitly with provenance tags.
Benchmark for robustness under adversarial conditions, not just task quality.
Assume structured outputs are format controls, not truth controls.
Add system-level constraints around actions and data access.
Evaluate with attack suites, not only happy-path evaluations.

The bigger point

Prompt injection is the price we pay for flexible natural language interpretation

I do not think prompt injection is an edge case. I think it is the price we pay for building systems that interpret natural language flexibly across mixed-trust environments. That does not mean LLM products are doomed. It means we need to stop pretending that security lives only in the model. Some of it lives in the model. A lot of it lives in the architecture. And an uncomfortable amount of it lives in what looks, on the surface, like simple interface design. The best LLM systems in the next few years will not just be the ones that reason well. They will be the ones that know who is allowed to influence that reasoning, and how much.

Reframe

Not an edge case

Prompt injection is a fundamental consequence of systems that interpret natural language flexibly across mixed-trust environments. It is not a bug to fix; it is a property to manage.

Layers

Three security layers

Security lives in the model, in the architecture, and in interface design. Relying only on the model layer is insufficient.

Future

Future systems

The best LLM systems will distinguish themselves not by reasoning ability alone, but by how well they control who influences that reasoning.

Primary sources

References

These references span the academic research, industry guidance, and standards that inform this analysis. They represent the current state of understanding on prompt injection as both a technical problem and a systems architecture challenge.

References

Understanding and Defending Against Resume-Based Prompt Injections in HR AI

Akdemir, A. & Levy, J. H. (2025). RecSys in HR 2025. CEUR Workshop Proceedings Vol. 4046.

The foundational empirical study with 1,200 experiments across 5 models and 24 defense setups. Demonstrates dramatic variation in jailbreak success rates (0.8% to 52.1%) and the effectiveness of untrusted-tag + guardrail configurations.

LLM01:2025 Prompt Injection

OWASP GenAI Security Project.

OWASP's top-ranked GenAI risk, describing prompt injection as crafted inputs that manipulate model behavior, bypass safeguards, or trigger unintended actions.

Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations

NIST AI 100-2e2025.

NIST's comprehensive taxonomy, explicitly discussing how LLM use in retrieval has "blurred the data and instruction channels" enabling indirect prompt injection.

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Hines, K. et al. Microsoft Research.

Proposes provenance-signaling transformations to distinguish trusted from untrusted sources. Reduced attack success from >50% to <2% with minimal task impact.

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

Wu, F., Cecchetti, E., & Xiao, C.

Argues for separating planning and execution, filtering untrusted input before it influences high-trust planning, and reasoning about security at the system level.

INJECAGENT: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents

Zhan, Q. et al.

1,054 test cases spanning 17 user tools and 62 attacker tools. GPT-4 with ReAct prompting compromised 24% of the time.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

Debenedetti, E. et al.

97 realistic tasks and 629 security test cases in a dynamic environment. Treats prompt injection as an evolving interaction problem.

Visual Prompt Injection Attacks for Computer-Use Agents

Cao, T. et al.

VPI-Bench: 306 test cases across five platforms showing malicious visual instructions can manipulate computer-use agent behavior.

How Microsoft defends against indirect prompt injection attacks

Microsoft Security Response Center.

Enterprise guidance on layered defenses: input isolation, detection, provenance separation, governance controls, and deterministic blocking.

Understanding prompt injections: a frontier security challenge

OpenAI.

OpenAI's framing of prompt injection as a "frontier security challenge" and emphasis on constraining impact even when manipulation succeeds.

Designing AI agents to resist prompt injection

OpenAI.

Agent design guidance emphasizing constraints on risky actions, data protection, and system design that limits damage from successful attacks.

AI Security

OWASP Top 10 for LLM Apps: Real Attacks, Real Fixes

For LLM apps, the attack often arrives as plain language rather than obviously malicious code. This guide walks through the OWASP risks as real failure stories, then shows the concrete controls that stop them.

16 min read

AI Governance

Security & Compliance Standards for AI Systems

AI security begins where ordinary app security stops: the attack can be a dataset, a gradient, or a paragraph that looks harmless. This guide maps that wider threat surface and the controls regulated teams need.

14 min read

Insurance Systems

Parametric Insurance Deep Dive: Speed, Basis Risk, and Trigger Design

Parametric insurance replaces loss adjustment with an observable trigger, but the hard part is not speed. It is picking a defensible index, reducing basis risk, and governing the data path from event to payout.

11 min read

All articles