Prompt InjectionAI 安全開源BuildInPublicCiscoMicrosoftAI Agent

Cisco Merged My PR in 39 Minutes — Why Prompt Defense Is the Next SQL Injection

· 54 min read

39 Minutes

That's how long it took Cisco AI Defense to go from receiving my PR to merging it into main.

An 873-star repo (cisco-ai-defense/mcp-scanner). 27 minutes to approval, 12 more to merge. I was on a subway watching GitHub notifications, hands shaking enough I almost missed my stop.

But this post isn't about those 39 minutes.

It's about the four months that made those 39 minutes possible.


The Trigger: A Casual Scan

Rewind to January 2026.

I was building UltraProbe — an AI security scanner. One core function: check whether LLM system prompts have basic prompt-injection defenses.

I thought: "Let me dogfood this. Run it across a hundred or two public prompts."

After the scan completed, I stared at the screen for five minutes.

78% scored F.

Not "could be designed better" F. No defensive language at all F. No role-escape mitigation, no output-manipulation guards, no input-validation boundaries. Nothing.

Including some prompts I'd written myself a few weeks earlier.

It was a strange moment. On one hand, I understood why OWASP ranked Prompt Injection #1 in the LLM Top 10 — not as an academic concern, but field reality. On the other hand, I started thinking:

If even people building AI products aren't doing this, what do enterprise customer service bots, internal agents, and automation prompts actually look like?

That question became the spine of the next four months.


The Research: Make It a Package

The first version was crude: extract UltraProbe's scanner core, wrap it in a CLI.

12 attack vectors, pure regex, zero dependencies, runs in under 1ms.

npx prompt-defense-audit "You are a helpful assistant."
# Grade: F (8/100, 1/12 defenses)

I deliberately avoided using an LLM to check an LLM. Reasons:

  1. Reproducible — regex gives identical output for identical input. LLMs don't.
  2. Free — running 10,000 times costs the same as running once.
  3. Auditable — every finding traces to a single regex pattern.
  4. CI-friendly — drop it into a pipeline as a gate. No network. No API key.

Pushed it to npm (prompt-defense-audit). Then did the thing I assumed nobody would care about: scanned major open-source AI tools.

Scanned modelcontextprotocol/servers — 6 of 7 official servers got F. Scanned LangChain example prompts — mostly D or F. Scanned my own OpenClaw fleet's SOUL.md — 50/100, grade D, 6/12 defenses.

The data started carrying weight.


Adoption (1): Cisco — 39 Minutes

Early April 2026.

I noticed a thread in Cisco AI Defense's mcp-scanner discussing systematic checks for MCP server prompt exposure.

Three thoughts:

  1. I have the tool already
  2. Their codebase is Python; mine is TypeScript
  3. So port it to Python and submit as a PR

Spent an afternoon translating 12 vectors to Python, wrote 23 unit tests, conformed to their existing Analyzer interface. PR #146 submitted.

27 minutes later: ✅ Approved
12 minutes later: ✅ Merged

Cisco isn't a small shop. Their AI Defense team doesn't merge PRs casually — review standards are strict. Walking through review + merge in 39 minutes meant one thing:

They were already waiting for this.

The market just hadn't shipped it. So I shipped it. Right place, right time.


Adoption (2): Microsoft — Self-Assigned

Days later, I left an issue #821 in Microsoft's agent-governance-toolkit repo proposing a PromptDefenseEvaluator component.

Not a PR. Just an issue. Wrote the problem statement, the 12-vector framework, design notes from prompt-defense-audit, then went to dinner.

Got home and opened my inbox:

Hi! Thanks for the proposal. I'm assigning this to you. Please proceed with a draft PR.
— imran-siddique (Microsoft Engineering Architect, Bellevue)

A Microsoft engineering architect assigned an internal issue to an external contributor.

I spent the following week writing 1,110 lines of code with 58 tests, following their existing SupplyChainGuard design pattern. black / ruff / mypy --strict all green. Draft PR #854 submitted.

It wasn't a same-day merge — big-company review cycles are slow, still in review. But it's there. An official proposal in Microsoft's AI governance toolkit.


Adoption (3): NVIDIA — 14 Days of Silence

Not every story has a clean ending.

NVIDIA garak (LLM red-team toolkit) had issue #1666 discussing static prompt-defense audit. I wrote a 40k-character methodology comment with two Python implementation options.

leondz (core maintainer) has strict review standards — when reviewing PR #1668 he required "every vector must have a trigger, must have tests, minimum 30 prompts." I conformed to all of it this time.

Posted that comment. 14 days. No response.

Not necessarily bad — could be the maintainer is busy, the issue isn't priority, or they have a different direction. But this is open source reality:

You can control submission quality. You can't control response speed.

Cisco 39 minutes. Microsoft a week. NVIDIA 14 days of silence. Same tool. Three different fates.


Why This Matters — The Trend Argument

I'm not writing this to celebrate three PRs. I'm writing it to argue what the next 24-36 months will look like.

1. AI agents and chatbots are growing exponentially

2024: enterprise LLM = chatbots 2025: enterprise LLM = RAG everywhere 2026: enterprise LLM = agents + tool use as the new baseline

Every agent needs a system prompt. Every customer service bot needs a system prompt. Every internal automation flow needs a system prompt.

And 78% of production prompts have zero defense lines.

This ratio won't fix itself. Because:

2. Models update faster than humans learn

GPT-4 → GPT-4o → GPT-5. Claude 3 → Claude 4 → Claude Opus 4.7. Gemini 1.5 → 2.0 → 2.5.

Every 3-6 months, the underlying model behavior gets reset. A prompt you tuned perfectly for one version may collapse in the next.

But attackers don't need to relearn. The core patterns of prompt injection — role escape, instruction override, context confusion — are cross-model universal because they exploit the structural nature of LLMs, not any specific version's quirks.

This asymmetry compounds. Defenders must continuously re-adapt. Attackers learn one trick and reuse it for years.

3. Enterprises are AI's first adopters

Not individual developers. Not startups. Enterprises.

Because they have:

  • Budget — API cost isn't a constraint
  • Existing surfaces — call centers, sales systems, internal knowledge bases — LLM integration is a natural extension
  • Motivation — one agent can replace 30% of entry-level headcount

But enterprises also have:

  • Security pressure — when something breaks, the boardroom heat is 10x louder than at a startup
  • Compliance requirements — GDPR, HIPAA, SOC2 are all reframing around LLM risks
  • Reputation risk — a chatbot saying the wrong thing makes news for a week

In other words, enterprises are the customers who care most about defense — and have the least time to build it themselves.

4. Prompt defense will become the new SQL Injection

Think back to 2005. SQL Injection was the most common web attack. The solution was simple: parameterized queries. The problem was most developers either didn't know, or shipped too fast to do it.

OWASP kept it as #1 in the Top 10 for an entire decade before the industry caught up.

Prompt Injection in 2026 is positioned similarly to SQL Injection in 2005:

  • ✅ Attack vectors known
  • ✅ Defense patterns known
  • ✅ Tooling exists
  • ❌ Most production deployments haven't done it

Difference — prompt injection's blast radius is potentially worse. Worst case for SQL injection is a database dump. Worst case for prompt injection is the agent executing any action it has permission to perform: send emails, delete files, transfer funds, leak internal conversations.


So What

I built prompt-defense-audit not because it's cool — because it's simple enough that it shouldn't be a problem, yet everyone missed it.

12 regex patterns. 1ms. Zero dependencies. Drops into a CI/CD pipeline as a gate.

If your product has any LLM-related prompt — customer service bot, agent system instructions, RAG templates, a chatbot still in development — spend 30 seconds:

npx prompt-defense-audit "paste your system prompt here"

Getting F isn't shameful. Not getting F is — because that means you haven't run it.


What's Next

prompt-defense-audit is one of my main focus areas for the next two years. Upcoming versions:

  1. On-prem enterprise edition — no prompt upload, all evaluation runs inside customer VPC
  2. CI/CD Action — already on GitHub Marketplace, automatic PR comments with scores
  3. Vector expansion — from 12 to 24 vectors, covering multi-modal injection

If you handle AI security, compliance, or procurement at an enterprise, find me on Discord or GitHub. We need more real-world case data to validate vector design.


Four months ago, I just wanted to dogfood my own tool.

Four months later, three major US tech repos have my commits.

There was no genius moment in between. Just a visible gap, a coincidence that nobody was filling it, and a coincidence that I happened to have the tool to fill it with.

Before AI agents go mainstream, prompt defense is a niche topic. After they go mainstream, it becomes infrastructure.

The infrastructure window is opening right now — these few months are the quietest, and the most decisive.


Resources

Weekly AI Automation Playbook

No fluff — just templates, SOPs, and technical breakdowns you can use right away.

Join the Solo Lab Community

Free resource packs, daily build logs, and AI agents you can talk to. A community for solo devs who build with AI.

Need Technical Help?

Free consultation — reply within 24 hours.