Phoenix logo

Phoenix

Free tier

The open-source platform for AI agent development, tracing, and evaluation

Free tier available·Technical·Powered by Vendor Agnostic·API available·Open source

Key strengths

Full OpenTelemetry-native tracing for LLM agentsLLM-as-a-judge and human annotation for evaluationVendor-agnostic — works with any model, framework, or languageSelf-hostable with zero data leaving your infrastructureEnd-to-end iteration loop: trace → annotate → experiment → measure
Free tier + paid plans
US
Self-hostable
No ratings yet

Developer Documentation

Phoenix is instrumented via OpenTelemetry and the OpenInference semantic conventions, making it compatible with any OTel-capable SDK or framework.

Quick Setup

pip install arize-phoenix
phoenix serve  # starts local instance at localhost:6006

Tracing

Instrument your LLM app using Phoenix's auto-instrumentation libraries or manually via OTel spans. Traces capture prompts, completions, retrieval chunks, tool calls, latency, and token counts.

Evaluation

Use the phoenix.evals module to run LLM-as-a-judge evaluations on your traces:

from phoenix.evals import llm_classify, RAG_RELEVANCY_PROMPT_TEMPLATE
results = llm_classify(dataframe=df, model=model, template=RAG_RELEVANCY_PROMPT_TEMPLATE)

Experiments

Create datasets from production traces, define evaluation functions, and run comparative experiments to benchmark prompt or retrieval changes directly in the Prompt IDE.

Agent Integration (MCP)

npx skills add Arize-ai/phoenix

Connects coding agents to Phoenix so automations can instrument, query, and evaluate agents programmatically.