開源AI 安全Prompt Injectionlessons-learned

12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

April 7, 2026 · 40 min read

Table of Contents

The Shotgun Approach
Why Everything Failed
Mistake 1: Using the Same Key for Every Door
Mistake 2: Ship First, Ask Never
Mistake 3: Following the Wrong Guide
Mistake 4: The Tool Itself Wasn't Credible
The Turning Point
Read the Code Before Writing Code
Polish the Tool
Respond Correctly to garak
The First Merge
Current Status
For Those Currently Getting Rejected
This Post Will Be Updated

12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

We built an LLM prompt defense scanner — prompt-defense-audit. It scans system prompts and uses pure regex to detect whether defenses exist against 12 attack vectors. No LLM, no API calls, runs in under 5ms.

We had real data behind it: 1,646 production system prompts scanned across 4 public datasets. 97.8% lacked indirect injection defense. Average score: 36/100.

We thought this research was valuable and decided to contribute it back to the open source community.

Then we spent two weeks learning a painful lesson.

The Shotgun Approach

We submitted PRs or issues to 12 open source projects:

Project	Type	Result
NVIDIA garak (7,500⭐)	PR #1669	Closed
Cisco AI Defense skill-scanner	Issue #81	Closed
OWASP LLM Top 10	PR #816	No response
Anthropic cookbook	PR #502	No response
Microsoft agent-governance-toolkit	Issue #821	Later responded
Microsoft presidio	Issue #1933	No response
awesome-llm-security	PR #134	No response
awesome-ai-tools	PR #1031	No response
Awesome-Prompt-Engineering	PR #91	No response
agent-audit	Issue #5	No response
NVIDIA NeMo-Guardrails	Issue #1764	No response

12 submissions. 0 merges. 2 outright closures. 9 no-replies.

Why Everything Failed

In hindsight, the reasons were obvious.

Mistake 1: Using the Same Key for Every Door

We submitted the same YAML-formatted "defense posture patterns" to different projects. Each project has its own architecture, its own language, its own plugin format. We didn't bother to understand any of them.

NVIDIA's garak is a Python framework. Its core concepts are Probe (generates attacks) and Detector (evaluates responses), using class inheritance, a detect() method, and pytest.

We submitted 6 YAML files.

Maintainer Jeffrey Martin's response:

"Declining as this PR does not even attempt to integrate with garak usage and code standards."

He was right. We submitted a spec document when they wanted runnable Python modules.

Mistake 2: Ship First, Ask Never

garak's creator, Leon Derczynski (a professor at ITU Copenhagen), had actually asked us a question in the issue thread:

"Can you give references to the principles behind the defense assessment approach & quantification method?"

We didn't answer his question. We just opened the PR.

That's skipping the "get alignment" step. In open source, this is a cardinal sin. You're supposed to discuss direction in the issue, confirm architectural fit, get some level of buy-in, then write code.

"Build it then ask" isn't efficiency in open source. It's arrogance.

Mistake 3: Following the Wrong Guide

In garak's issue discussion, an independent researcher (not a garak maintainer) enthusiastically responded: "go ahead and open that PR against community_modules/contrib/."

We did. But that directory structure was from his own repo, not garak's. He didn't have merge authority.

Lesson: Verify who you're talking to. Enthusiasm ≠ authority.

Mistake 4: The Tool Itself Wasn't Credible

When maintainers clicked through to our prompt-defense-audit repo, they saw:

3 stars
3 commits
Zero CI/CD
No test framework (just a hand-rolled assert file)
No CONTRIBUTING.md
No SECURITY.md — a security tool without a security policy

This looks like a weekend side project, not a library worth integrating into enterprise-grade tools.

The Turning Point

Cisco AI Defense's skill-scanner rejected our issue, but maintainer vineethsai7 said something crucial:

"I don't think this is part of the scope of skill-scanner. If you think the MCP specific ones can fit in the MCP scanner, please open a PR there!"

He pointed us to mcp-scanner.

This time, we changed our approach.

Read the Code Before Writing Code

We spent time reading through mcp-scanner's architecture — how their threat detection modules work, how tests run, what their Python code style looks like. Then we wrote a Prompt Defense Analyzer in their language, not our own YAML format.

Polish the Tool

We upgraded prompt-defense-audit from toy to professional:

Hand-rolled asserts → Vitest, 84 tests, 100% coverage
No CI → GitHub Actions, Node 20/22 matrix, green badge
No docs → CONTRIBUTING.md, SECURITY.md, CHANGELOG.md, issue templates
3 commits → v1.3.0 with proper release notes

Respond Correctly to garak

Back on garak's issue thread, we answered Leon's methodology question:

Attached academic references (Greshake et al. 2023 on indirect injection, Schulhoff et al. 2023 on injection taxonomy, OWASP LLM Top 10)
Acknowledged the PR's architectural failure
Proposed two directions using Python probe/detector classes
Explicitly said "I'd rather get alignment before writing code this time"

Then waited.

The First Merge

Cisco mcp-scanner PR #146 — from opening the PR to merge: 51 minutes.

Honestly, this wasn't a high-difficulty technical achievement. mcp-scanner is a young project, actively accepting contributions, with a relatively low bar. Our PR was pure addition (955 lines added, 0 deleted), touching no existing code — low risk for the reviewer.

But it represents one thing: we learned to speak in their language.

Current Status

Project	Status
Cisco mcp-scanner	Merged ✅
NVIDIA garak	Awaiting maintainer response — right direction, outcome unknown
Microsoft agent-governance-toolkit	Positive engagement — maintainer proposed collaboration direction
Remaining 9	Dormant

One merge doesn't equal success. But the distance from zero merges to one is greater than from one to ten.

For Those Currently Getting Rejected

If you're trying to contribute to open source AI security projects, here's what we paid tuition to learn:

1. Discuss in the issue first. Get alignment before writing code.

Most rejected PRs fail not because the code is bad, but because the direction wasn't aligned. Three sentences of discussion in an issue can save you a week of work.

2. Write in their language.

If the project uses Python, write Python. If they have a base class, inherit from it. Don't invent your own format and expect them to adapt to you.

3. Your tool must match your ambition.

If you're submitting contributions to NVIDIA and Cisco, your repo can't look like a weekend project. CI, tests, docs, security policy — these aren't decoration. They're signals that tell maintainers whether you're reliable.

4. Rejection is information, not a dead end.

"Not in our scope" → Find the right repo. "Doesn't integrate with our architecture" → Read their code. "Can you provide methodology references?" → They're interested, but need you to prove rigor.

Every rejection tells you where to go next.

5. Verify who you're talking to.

An enthusiastic reply ≠ merge authority. Check whether the person is a maintainer, a contributor, or a passerby.

This Post Will Be Updated

The garak story isn't over. If Leon Derczynski accepts our direction, we'll write a proper Python probe/detector module — that will be a real technical challenge. If we get rejected again, we'll write about that too.

Open source contribution isn't a hero's story. It's the process of hitting walls repeatedly and learning to turn.

This post is by the Ultra Lab team. We build AI security tools. prompt-defense-audit is MIT-licensed and open for contributions.

12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

The Shotgun Approach

Why Everything Failed

Mistake 1: Using the Same Key for Every Door

Mistake 2: Ship First, Ask Never

Mistake 3: Following the Wrong Guide

Mistake 4: The Tool Itself Wasn't Credible

The Turning Point

Read the Code Before Writing Code

Polish the Tool

Respond Correctly to garak

The First Merge

Current Status

For Those Currently Getting Rejected

This Post Will Be Updated

Join the Solo Lab Community

Need Technical Help?

#12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

#The Shotgun Approach

#Why Everything Failed

#Mistake 1: Using the Same Key for Every Door

#Mistake 2: Ship First, Ask Never

#Mistake 3: Following the Wrong Guide

#Mistake 4: The Tool Itself Wasn't Credible

#The Turning Point

#Read the Code Before Writing Code

#Polish the Tool

#Respond Correctly to garak

#The First Merge

#Current Status

#For Those Currently Getting Rejected

#This Post Will Be Updated

Related Posts

We Audited 7 Official MCP Servers — 6 Got F

Cisco Merged My PR in 39 Minutes — Why Prompt Defense Is the Next SQL Injection

One Line to Block 92% of Prompt Injection Attacks

Weekly AI Automation Playbook

Join the Solo Lab Community

Need Technical Help?

12 Submissions, 0 Merges: What I Learned Contributing to Open Source AI Security

The Shotgun Approach

Why Everything Failed

Mistake 1: Using the Same Key for Every Door

Mistake 2: Ship First, Ask Never

Mistake 3: Following the Wrong Guide

Mistake 4: The Tool Itself Wasn't Credible

The Turning Point

Read the Code Before Writing Code

Polish the Tool

Respond Correctly to garak

The First Merge

Current Status

For Those Currently Getting Rejected

This Post Will Be Updated