Phoenix logo

Phoenix

Free tier

The open-source platform for AI agent development, tracing, and evaluation

Free tier available·Technical·Powered by Vendor Agnostic·API available·Open source

Key strengths

Full OpenTelemetry-native tracing for LLM agentsLLM-as-a-judge and human annotation for evaluationVendor-agnostic — works with any model, framework, or languageSelf-hostable with zero data leaving your infrastructureEnd-to-end iteration loop: trace → annotate → experiment → measure
Free tier + paid plans
US
Self-hostable
No ratings yet
  • Distributed LLM tracing: Instrument multi-step agentic workflows with OpenTelemetry spans conforming to the OpenInference spec, capturing full execution context across tool calls, retrievals, and LLM completions.
  • LLM-as-a-judge evaluation pipelines: Use phoenix.evals to run scalable automated evaluations (relevance, toxicity, hallucination, Q&A correctness) using any LLM as the judge.
  • Dataset curation from traces: Export production traces into structured datasets for fine-tuning, regression testing, or benchmarking new model versions.
  • A/B experimentation on prompts and retrievers: Run controlled experiments in the Prompt IDE comparing prompt variants or retrieval strategies against the same dataset using custom eval metrics.
  • Self-hosted observability backend: Deploy Phoenix on-prem via Docker or Kubernetes (Helm) to keep all trace data within your own infrastructure — critical for sensitive or regulated environments.
  • MCP-based agent automation: Connect coding agents to Phoenix via the MCP skill interface to programmatically instrument, query traces, and trigger evaluation runs as part of CI/CD workflows.