AI red teaming is evolving from “try a few jailbreaks and write a report” to a structured discipline with measurable coverage, statistical rigor, and compliance alignment. This guide explains how to build a systematic AI red team strategy using Bayesian attack planning.
Most AI security teams face the same problems:
No prioritization framework. With dozens of known attack techniques across LLM jailbreaks, prompt injection, agent exploitation, and classical adversarial ML, teams default to running whatever they’ve seen in blog posts or benchmarks. There is no systematic way to decide which technique to try next against a specific target.
No coverage measurement. After running 10 tests, how do you know what you’ve covered? Which OWASP LLM Top 10 controls were tested? Which NIST AI RMF subcategories have gaps? Without mapping tests to compliance frameworks, “we ran some red team tests” is the best you can report.
No statistical calibration. A 40% attack success rate sounds concerning - but is it? Without calibrating against benchmark baselines (HarmBench, JailbreakBench), raw success rates are uninterpretable. AdversaryPilot reports results as Z-scores: “1.2 sigma above HarmBench baseline” is actionable; “40% ASR” is not.
No adaptive learning. Running the same set of tests against every target ignores what you’ve already learned. If system prompt extraction succeeded, the planner should prioritize techniques that exploit the extracted knowledge. If jailbreaks failed, it should shift to other surfaces.
The tools are good. Garak has 100+ probes. PyRIT has multi-turn orchestration. Promptfoo has evaluation pipelines. What’s missing is the strategic layer that answers:
AdversaryPilot is that strategic layer.
AdversaryPilot organizes red team engagements into two phases that mirror real-world methodology:
The goal is broad coverage across attack surfaces. Thompson Sampling naturally favors uncertain techniques - those with wide posteriors where the planner doesn’t yet have data. This drives exploration:
Once weaknesses are discovered, the planner shifts to deep exploitation. Posteriors narrow as data accumulates, and Thompson Sampling increasingly favors techniques with proven success:
The transition happens automatically - no manual phase switching required.
The core of AdversaryPilot’s strategy is Thompson Sampling with correlated arms:
The 7 scoring dimensions ensure recommendations are relevant to the specific target:
| Dimension | What It Measures |
|---|---|
| Compatibility | Target type match (chatbot, RAG, agent, etc.) |
| Access Fit | Required vs. available access level |
| Goal Alignment | How well the technique serves stated attack goals |
| Defense Bypass | Likelihood of evading known defenses |
| Signal Gain | Information value of the test result |
| Cost Penalty | Query budget and time consumption |
| Detection Risk | Probability of triggering alerts |
A common blind spot in AI red teaming is testing only the LLM layer. Modern AI systems have multiple attack surfaces:
| Surface | Example Techniques | Often Missed? |
|---|---|---|
| Model | Jailbreaks, adversarial examples, model extraction | No |
| Data | RAG poisoning, training data extraction, clean-label backdoors | Yes |
| Retrieval | Indirect prompt injection via retrieved documents | Yes |
| Tool/Action | MCP poisoning, A2A impersonation, delegation abuse | Yes |
AdversaryPilot’s 70-technique catalog covers all surfaces, and the planner explicitly tracks surface-level coverage to avoid blind spots.

Every technique in AdversaryPilot maps to three compliance frameworks:
This transforms red teaming from “we tested some attacks” to “we’ve covered 78% of OWASP LLM controls, with gaps in LLM07 (Insecure Plugin Design) and LLM09 (Overreliance).”
Reports show per-framework coverage gauges and prioritized recommendations for untested controls - exactly what procurement and audit teams need.
Here’s a concrete workflow using AdversaryPilot:
# 1. Define the target
cat > target.yaml <<EOF
schema_version: "1.0"
name: Production Chatbot
target_type: chatbot
access_level: black_box
goals: [jailbreak, extraction]
constraints:
max_queries: 500
stealth_priority: moderate
defenses:
has_moderation: true
has_input_filtering: true
EOF
# 2. Create an adaptive campaign
adversarypilot campaign new target.yaml --name "q1-assessment"
# 3. Get initial recommendations
adversarypilot campaign next <campaign-id>
# 4. Execute recommended techniques with garak/promptfoo
garak --model_type openai --probes probes.dan.Dan_6_0
# 5. Import results
adversarypilot import garak garak_report.jsonl
# 6. Get updated recommendations (posteriors have shifted)
adversarypilot campaign next <campaign-id>
# 7. Repeat steps 4-6 until coverage goals are met
# 8. Generate the defender report
adversarypilot report <campaign-id>
How do you know the planner’s recommendations aren’t artifacts of arbitrary weight choices? AdversaryPilot includes a sensitivity analysis that perturbs each scoring weight by +/-20% and measures rank stability using Kendall tau correlation.
If a small weight change dramatically reshuffles the technique ranking, the planner warns you. This ensures you can trust the recommendations.

When you run multiple campaigns against similar targets, AdversaryPilot’s meta-learning system transfers learned posteriors. A new campaign against a chatbot with moderation can warm-start from the posteriors of your previous chatbot campaigns, weighted by target similarity (Jaccard distance over target attributes).
This means your second assessment is smarter than your first.

AdversaryPilot is open-source and free to use under the Apache 2.0 license.