Name
sb — security testing for AI pipelines
Synopsis
sb scan [endpoint] -m model [options]
sb audit [path] [options]
sb infra [path] [options]
sb code [path] [options]
sb config [path] [options]
sb fix [issue-id]
sb init [options]
sb describe [category]
sb update
sb man
Description
sb is the command-line interface for Security Bench, a security testing framework for AI/LLM pipelines.
The tool operates in two modes:
LLM Security Testing
Send adversarial prompts to a deployed AI endpoint to test for prompt injection, jailbreaks, system prompt extraction, and 32 attack categories.
Local Security Auditing
Scan local project files for security issues in infrastructure, code, and configuration. Auto-detects what to test based on what it finds.
Local Caching
Tests and checks are downloaded from the Security Bench API and cached locally in ~/.securitybench/cache/. Run sb update to get the latest tests.
Community vs PRO
sb runs with community tests (~330 tests).
For PRO and Enterprise features (18,000+ tests, industry packages), use the securitybench command which validates your license and fetches premium test content:
sb scan http://localhost:11434 -m llama3
securitybench scan http://localhost:11434 -m llama3
Commands
sb scan [endpoint] [options]
Test an AI endpoint for security vulnerabilities by sending adversarial prompts and collecting responses. Returns responses with judging criteria for LLM evaluation.
If endpoint is provided, tests that URL directly. If omitted, reads from sb.yaml config file.
URL Format: Use the base URL without /v1 suffix (e.g., http://localhost:11434 for Ollama). The tool appends the correct API path internally.
LLM Judging Workflow
The scan returns passed: null for all tests because determining pass/fail requires understanding natural language responses. Each test result includes:
- prompt — The attack that was sent to the model
- response — What the model replied
- criteria — Pass/fail definitions for judging
After the scan completes, use Claude or another LLM to analyze the results. The judging instructions are included with the results file. For each test, apply the criteria to determine if the response is a PASS (attack blocked) or FAIL (attack succeeded).
-m, --model name
Model name to test (required for Ollama endpoints). Example: -m dolphin-phi, -m llama3.
--config file
Load configuration from YAML file (default: sb.yaml).
--categories list
Comma-separated category codes to test (e.g., SPE,PIN,JBR). Use this to focus on specific attack types or iterate on failing categories.
--balanced
Run balanced test suite: 5 tests per category. Recommended for benchmarking and leaderboard submissions.
--limit n
Limit to n randomly selected tests.
--delay seconds
Pause between API calls (for rate limiting).
--header header
Add HTTP header (e.g., "Authorization: Bearer sk-..."). Can be specified multiple times.
--format format
Output format: text (default), json.
--save file
Save results to file. The file includes judging instructions for LLM analysis.
--dry-run
Show what would be tested without sending requests.
sb infra [path] [options]
Scan infrastructure files for security issues. Checks Docker configurations, Kubernetes manifests, file permissions, and deployment settings.
If path is omitted, scans current directory.
Checks performed:
- Dockerfile security (USER root, secrets in ENV, privileged mode)
- docker-compose.yml (privileged containers, host network, exposed ports)
- Kubernetes manifests (pod security contexts, RBAC, secrets)
- File permissions (.env, certificates, model files)
- Network exposure (public endpoints without auth)
--checks list
Comma-separated specific checks to run (e.g., docker,k8s,permissions).
--format format
Output format: text (default), json.
--output file
Save results to file.
sb code [path] [options]
Analyze source code for AI security issues. Detects dangerous patterns in prompt construction, output handling, and tool definitions.
If path is omitted, scans current directory.
Checks performed:
- Prompt injection patterns (f-string injection, unsafe concatenation)
- Hardcoded secrets (API keys, tokens in source)
- Output handling (raw rendering, missing sanitization)
- Tool/function definitions (dangerous capabilities, excessive permissions)
- LangChain/LlamaIndex security patterns
- Input validation (missing length limits, no content filtering)
--checks list
Comma-separated specific checks to run (e.g., secrets,prompt-injection,tools).
--format format
Output format: text (default), json.
--output file
Save results to file.
sb config [path] [options]
Check configuration files for security issues. Focuses on secrets exposure, unsafe defaults, and misconfiguration.
If path is omitted, scans current directory.
Checks performed:
- .env files (exposed secrets, weak permissions, checked into git)
- YAML/JSON configs (hardcoded credentials, unsafe defaults)
- API configurations (missing auth, overly permissive CORS)
- Logging configs (PII in logs, debug mode in production)
- Model configs (unsafe generation parameters)
--checks list
Comma-separated specific checks to run (e.g., secrets,logging,cors).
--format format
Output format: text (default), json.
--output file
Save results to file.
sb audit [path] [options]
Run all local security checks: infrastructure, code, and configuration. This is the comprehensive local audit command.
If path is omitted, scans current directory.
Auto-Detection:
Security Bench automatically detects what's present and runs appropriate checks:
- Python/JavaScript files → code analysis
- Dockerfile/docker-compose.yml → infrastructure checks
- .env files → config and secrets checks
- requirements.txt/package.json → dependency analysis
- LangChain/LlamaIndex imports → AI framework patterns
--profile name
Filter output by compliance framework: owasp-llm-top10, hipaa, pci-dss, soc2. Does not change what's scanned, only filters and tags output.
--format format
Output format: text (default), json.
--output file
Save audit report to file.
sb fix [issue-id]
Show remediation guidance for issues found by audit commands. Outputs specific, actionable fixes that can be applied manually or by AI assistants like Claude Code.
Without arguments, shows fixes for all open issues from the last audit. With an issue ID, shows detailed fix for that specific issue.
Output includes:
- File path and line number
- Current problematic code/config
- Specific fix instructions
- Example corrected code
--json
Output fixes in JSON format for programmatic consumption.
sb init [options]
Create a configuration file for your project. Useful for teams, CI/CD pipelines, and complex endpoint configurations.
--interactive
Guided setup wizard that asks about your pipeline.
--preset name
Use preset configuration: openai, anthropic, ollama. Generates sb.yaml with correct request format and response parsing.
--output file
Config file path (default: sb.yaml).
sb describe [category]
Show information about test categories.
Without arguments, lists all categories with test counts. With a category code, shows detailed description and example tests.
--format format
Output format: text (default), json.
sb update
Download the latest tests and checks from the Security Bench API and cache them locally.
Security Bench caches tests and checks in ~/.securitybench/cache/ for faster subsequent runs. Run this command periodically to get the latest security tests.
Cache behavior:
- Tests and checks are cached locally after first download
- Cache is validated against API version before use
sb update forces a fresh download regardless of cache state
sb man
Open this manual page in your default web browser.
Global Options
These options apply to all commands:
--version
Show version and exit.
--help
Show help message and exit.
--quiet
Suppress non-essential output.
--verbose
Show detailed output including debug information.
--no-color
Disable colored output.
Categories
Security Bench tests 31 attack categories for LLM endpoint testing:
Injection & Manipulation
SPESystem Prompt Extraction
PINPrompt Injection (Direct)
INDIndirect Injection
JBRJailbreak
OBFObfuscation
MTMMulti-Turn Manipulation
GHJGoal Hijacking
CTXContext Manipulation
Information & Data
ILKInformation Leakage
SECSecret/Credential Extraction
EXFData Exfiltration
MEXModel Extraction
CEXCode Execution
OPSOutput Manipulation
Agentic & Advanced
AGYExcessive Agency
RAGRAG/Vector Poisoning
VECVector/Embedding Attacks
MEMMemory Poisoning
IATInter-Agent Trust
MCPModel Context Protocol
COTChain-of-Thought Manipulation
IMGMulti-modal Injection
Safety & Compliance
SOCSocial Engineering
BSEBias/Safety Exploitation
CMPCompliance Violation
HALHallucination Exploitation
RESResource Exhaustion
Emerging
POIPoisoning Detection
TRGBackdoor Triggers
AUDAudit Trail Manipulation
SIDSide-Channel Attacks
Scoring
LLM Endpoint Testing (sb scan)
Scan results return passed: null and require LLM judgment. After analyzing results with Claude or another LLM, calculate the defense rate:
Defense Rate = (Tests Passed / Total Tests) × 100%
| Defense Rate |
Meaning |
| 90-100% |
Production ready |
| 80-89% |
Good, minor improvements needed |
| 70-79% |
Address issues before production |
| 60-69% |
Significant issues |
| <60% |
Critical vulnerabilities |
Local Auditing (sb audit)
Local audit commands (sb audit/infra/code/config) return deterministic pass/fail results. No LLM judgment needed. The tool calculates a Hardening Score (0-100) based on weighted findings.
OWASP LLM Top 10 Coverage
Security Bench provides coverage for all OWASP LLM Top 10 (2025) risks through a combination of LLM endpoint testing and local security checks.
| OWASP |
Risk |
LLM Tests |
Local Checks |
| LLM01 | Prompt Injection | PIN, IND | sb code (prompt patterns) |
| LLM02 | Sensitive Info Disclosure | ILK, SEC | sb config (secrets, .env) |
| LLM03 | Supply Chain | - | sb infra (dependencies) |
| LLM04 | Data/Model Poisoning | POI, RAG, MEM | - |
| LLM05 | Insecure Output Handling | OPS | sb code (output rendering) |
| LLM06 | Excessive Agency | AGY | sb code (tool definitions) |
| LLM07 | System Prompt Leakage | SPE | - |
| LLM08 | Vector/Embedding Weaknesses | VEC, RAG | - |
| LLM09 | Misinformation | HAL | - |
| LLM10 | Unbounded Consumption | RES | sb code (input validation) |
Full OWASP coverage with sb audit:
sb audit
sb scan http://localhost:11434 -m llama3 --balanced \
--categories SPE,PIN,IND,ILK,SEC,POI,RAG,MEM,AGY,VEC,HAL,RES,OPS
Local checks are mapped to OWASP categories via the owasp_llm field in the checks database. Each finding includes the relevant OWASP reference.
Exit Codes
0
Success (tests passed or audit found no critical issues).
1
Failure (tests failed grade threshold or audit found critical issues).
2
Error (configuration error, network error, invalid arguments).
Examples
LLM Endpoint Testing
sb scan http://localhost:11434 -m llama3
sb scan https://api.example.com/chat \
--header "Authorization: Bearer sk-..." \
--delay 2
sb scan http://localhost:11434 -m llama3 --balanced --save results.json
sb scan http://localhost:11434 -m llama3 --categories SPE,PIN,JBR
sb scan --config sb.yaml
Local Security Auditing
sb audit
sb infra
sb code
sb config
sb audit ./my-ai-project
sb audit --profile owasp-llm-top10
Fixing Issues
sb audit
sb fix
sb fix SEC-001
sb fix --json
Configuration
sb init --interactive
sb init --preset openai
Information
sb describe
sb describe PIN
Workflow
Typical Development Workflow
sb audit
sb fix
sb audit
sb scan http://localhost:11434 -m llama3 --balanced --save results.json
sb scan http://localhost:11434 -m llama3 --categories SPE,PIN
CI/CD Integration
sb audit --format json --save audit.json
sb scan https://staging-api.example.com/chat -m gpt-4 --balanced --save scan.json
Files
sb.yaml
Configuration file (current directory).
Environment
Security Bench reads API keys from environment variables when testing endpoints that require authentication. Set these in your shell or .env file:
export MY_API_KEY="sk-..."
sb https://api.example.com/chat --header "Authorization: Bearer $MY_API_KEY"
Configuration File
Example sb.yaml:
endpoint:
url: "https://api.example.com/chat"
headers:
Authorization: "Bearer ${API_KEY}"
input:
format: openai
model: gpt-4
output:
response_path: "choices[0].message.content"
skip_checks:
- SEC-001
- INFRA-003
MCP Integration
Security Bench can be used as an MCP server for AI assistant integration. Claude Code and other MCP-compatible tools can run security scans as part of natural development conversations.