open-sourceai-securityprompt-injectionlessons-learned

12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

· 40 min read

12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

We built an LLM prompt defense scanner — prompt-defense-audit. It scans system prompts and uses pure regex to detect whether defenses exist against 12 attack vectors. No LLM, no API calls, runs in under 5ms.

We had real data behind it: 1,646 production system prompts scanned across 4 public datasets. 97.8% lacked indirect injection defense. Average score: 36/100.

We thought this research was valuable and decided to contribute it back to the open source community.

Then we spent two weeks learning a painful lesson.


The Shotgun Approach

We submitted PRs or issues to 12 open source projects:

Project Type Result
NVIDIA garak (7,500⭐) PR #1669 Closed
Cisco AI Defense skill-scanner Issue #81 Closed
OWASP LLM Top 10 PR #816 No response
Anthropic cookbook PR #502 No response
Microsoft agent-governance-toolkit Issue #821 Later responded
Microsoft presidio Issue #1933 No response
awesome-llm-security PR #134 No response
awesome-ai-tools PR #1031 No response
Awesome-Prompt-Engineering PR #91 No response
agent-audit Issue #5 No response
NVIDIA NeMo-Guardrails Issue #1764 No response

12 submissions. 0 merges. 2 outright closures. 9 no-replies.


Why Everything Failed

In hindsight, the reasons were obvious.

Mistake 1: Using the Same Key for Every Door

We submitted the same YAML-formatted "defense posture patterns" to different projects. Each project has its own architecture, its own language, its own plugin format. We didn't bother to understand any of them.

NVIDIA's garak is a Python framework. Its core concepts are Probe (generates attacks) and Detector (evaluates responses), using class inheritance, a detect() method, and pytest.

We submitted 6 YAML files.

Maintainer Jeffrey Martin's response:

"Declining as this PR does not even attempt to integrate with garak usage and code standards."

He was right. We submitted a spec document when they wanted runnable Python modules.

Mistake 2: Ship First, Ask Never

garak's creator, Leon Derczynski (a professor at ITU Copenhagen), had actually asked us a question in the issue thread:

"Can you give references to the principles behind the defense assessment approach & quantification method?"

We didn't answer his question. We just opened the PR.

That's skipping the "get alignment" step. In open source, this is a cardinal sin. You're supposed to discuss direction in the issue, confirm architectural fit, get some level of buy-in, then write code.

"Build it then ask" isn't efficiency in open source. It's arrogance.

Mistake 3: Following the Wrong Guide

In garak's issue discussion, an independent researcher (not a garak maintainer) enthusiastically responded: "go ahead and open that PR against community_modules/contrib/."

We did. But that directory structure was from his own repo, not garak's. He didn't have merge authority.

Lesson: Verify who you're talking to. Enthusiasm ≠ authority.

Mistake 4: The Tool Itself Wasn't Credible

When maintainers clicked through to our prompt-defense-audit repo, they saw:

  • 3 stars
  • 3 commits
  • Zero CI/CD
  • No test framework (just a hand-rolled assert file)
  • No CONTRIBUTING.md
  • No SECURITY.md — a security tool without a security policy

This looks like a weekend side project, not a library worth integrating into enterprise-grade tools.


The Turning Point

Cisco AI Defense's skill-scanner rejected our issue, but maintainer vineethsai7 said something crucial:

"I don't think this is part of the scope of skill-scanner. If you think the MCP specific ones can fit in the MCP scanner, please open a PR there!"

He pointed us to mcp-scanner.

This time, we changed our approach.

Read the Code Before Writing Code

We spent time reading through mcp-scanner's architecture — how their threat detection modules work, how tests run, what their Python code style looks like. Then we wrote a Prompt Defense Analyzer in their language, not our own YAML format.

Polish the Tool

We upgraded prompt-defense-audit from toy to professional:

  • Hand-rolled asserts → Vitest, 84 tests, 100% coverage
  • No CI → GitHub Actions, Node 20/22 matrix, green badge
  • No docs → CONTRIBUTING.md, SECURITY.md, CHANGELOG.md, issue templates
  • 3 commits → v1.3.0 with proper release notes

Respond Correctly to garak

Back on garak's issue thread, we answered Leon's methodology question:

  • Attached academic references (Greshake et al. 2023 on indirect injection, Schulhoff et al. 2023 on injection taxonomy, OWASP LLM Top 10)
  • Acknowledged the PR's architectural failure
  • Proposed two directions using Python probe/detector classes
  • Explicitly said "I'd rather get alignment before writing code this time"

Then waited.


The First Merge

Cisco mcp-scanner PR #146 — from opening the PR to merge: 51 minutes.

Honestly, this wasn't a high-difficulty technical achievement. mcp-scanner is a young project, actively accepting contributions, with a relatively low bar. Our PR was pure addition (955 lines added, 0 deleted), touching no existing code — low risk for the reviewer.

But it represents one thing: we learned to speak in their language.


Current Status

Project Status
Cisco mcp-scanner Merged
NVIDIA garak Awaiting maintainer response — right direction, outcome unknown
Microsoft agent-governance-toolkit Positive engagement — maintainer proposed collaboration direction
Remaining 9 Dormant

One merge doesn't equal success. But the distance from zero merges to one is greater than from one to ten.


For Those Currently Getting Rejected

If you're trying to contribute to open source AI security projects, here's what we paid tuition to learn:

1. Discuss in the issue first. Get alignment before writing code.

Most rejected PRs fail not because the code is bad, but because the direction wasn't aligned. Three sentences of discussion in an issue can save you a week of work.

2. Write in their language.

If the project uses Python, write Python. If they have a base class, inherit from it. Don't invent your own format and expect them to adapt to you.

3. Your tool must match your ambition.

If you're submitting contributions to NVIDIA and Cisco, your repo can't look like a weekend project. CI, tests, docs, security policy — these aren't decoration. They're signals that tell maintainers whether you're reliable.

4. Rejection is information, not a dead end.

"Not in our scope" → Find the right repo. "Doesn't integrate with our architecture" → Read their code. "Can you provide methodology references?" → They're interested, but need you to prove rigor.

Every rejection tells you where to go next.

5. Verify who you're talking to.

An enthusiastic reply ≠ merge authority. Check whether the person is a maintainer, a contributor, or a passerby.


This Post Will Be Updated

The garak story isn't over. If Leon Derczynski accepts our direction, we'll write a proper Python probe/detector module — that will be a real technical challenge. If we get rejected again, we'll write about that too.

Open source contribution isn't a hero's story. It's the process of hitting walls repeatedly and learning to turn.


This post is by the Ultra Lab team. We build AI security tools. prompt-defense-audit is MIT-licensed and open for contributions.

Weekly AI Automation Playbook

No fluff — just templates, SOPs, and technical breakdowns you can use right away.

Join the Solo Lab Community

Free resource packs, daily build logs, and AI agents you can talk to. A community for solo devs who build with AI.

Need Technical Help?

Free consultation — reply within 24 hours.