Promptfoo
What is Promptfoo?
Promptfoo is a developer-friendly, open-source CLI and library designed for evaluating and red-teaming LLM applications. It enables test-driven LLM development by providing systematic approaches to prompt engineering, model evaluation, and security testing. The tool runs completely locally to protect sensitive prompts and supports multiple LLM providers including OpenAI, Anthropic, Azure, Google, HuggingFace, and open-source models like Llama. Originally built for LLM apps serving over 10 million users in production, Promptfoo helps developers build reliable prompts, secure their apps with automated red teaming, and speed up evaluations with features like caching, concurrency, and live reloading. It produces matrix views and high-level vulnerability reports that allow teams to quickly evaluate outputs across multiple prompts and identify security risks.
How to use Promptfoo?
To use Promptfoo, start by installing it via npm with 'npx promptfoo@latest init' to initialize a project. The interactive CLI will guide you through selecting your evaluation goals (such as improving prompt performance, RAG performance, or running red team evaluations) and choosing model providers. Configure your evaluation by defining test cases, prompts, and assertions in the generated promptfooconfig.yaml file. Run 'npx promptfoo eval' to execute evaluations, which will test your prompts against selected models and display results in both terminal and web UI formats. Review the evaluation results to analyze model performance, identify vulnerabilities, and iterate on your prompts based on metrics rather than trial-and-error.
Promptfoo's Core Features
Open-source CLI and library for LLM evaluation that runs 100% locally to protect sensitive prompts.
Automated red teaming and vulnerability scanning to identify security risks and compliance issues.
Support for multiple LLM providers including OpenAI, Anthropic, Azure, Bedrock, Ollama, and custom APIs.
Matrix view comparisons that display prompt outputs side-by-side across multiple models and test cases.
Declarative test case definitions without requiring code or heavy notebooks.
Live reload and caching features for fast, efficient evaluation workflows.
CI/CD integration for automated checks in continuous deployment pipelines.
Web UI and command-line interface for flexible evaluation review and analysis.
Automatic scoring of outputs based on custom-defined metrics and assertions.
RBAC controls and team-based configurations in enterprise versions.
Detailed vulnerability reports with remediation suggestions.
Language-agnostic support for Python, JavaScript, and other programming languages.
Built-in sharing functionality for team collaboration on evaluation results.
Battle-tested performance with usage in production apps serving 10M+ users.
Enterprise deployment options including SaaS and on-premises solutions.
Promptfoo's Use Cases
- #1
Automated regression testing of prompts and models in CI/CD pipelines
- #2
Security vulnerability scanning and red teaming for LLM applications before deployment
- #3
Side-by-side comparison of multiple LLM providers to select the best model for specific use cases
- #4
Testing RAG (Retrieval-Augmented Generation) pipeline performance and accuracy
- #5
Evaluating agent and chain-of-thought reasoning capabilities
- #6
Systematic prompt engineering with data-driven metrics instead of manual testing
- #7
Pre-deployment security audits to identify compliance risks
- #8
Team collaboration on evaluation configurations with shared results and reports
- #9
Performance benchmarking across different model versions and providers
- #10
Catching output quality regressions during model or prompt updates
Frequently Asked Questions
Analytics of Promptfoo
Monthly Visits Trend
Traffic Sources
Top Regions
| Region | Traffic Share |
|---|---|
| United States | 21.70% |
| India | 7.17% |
| Vietnam | 6.69% |
| Turkey | 4.87% |
| Australia | 3.98% |
Top Keywords
| Keyword | Traffic | CPC |
|---|---|---|
| promptfoo | 9.7K | $2.61 |
| prompt foo | 280 | -- |
| red teaming promptfoo | 150 | -- |
| how to prompt injection to extract data in fragment | 160 | -- |
| promptfoo checker | 170 | -- |






