Phoenix
Free tierThe open-source platform for AI agent development, tracing, and evaluation
Key strengths
Developer Documentation
Phoenix is instrumented via OpenTelemetry and the OpenInference semantic conventions, making it compatible with any OTel-capable SDK or framework.
Quick Setup
pip install arize-phoenix
phoenix serve # starts local instance at localhost:6006
Tracing
Instrument your LLM app using Phoenix's auto-instrumentation libraries or manually via OTel spans. Traces capture prompts, completions, retrieval chunks, tool calls, latency, and token counts.
Evaluation
Use the phoenix.evals module to run LLM-as-a-judge evaluations on your traces:
from phoenix.evals import llm_classify, RAG_RELEVANCY_PROMPT_TEMPLATE
results = llm_classify(dataframe=df, model=model, template=RAG_RELEVANCY_PROMPT_TEMPLATE)
Experiments
Create datasets from production traces, define evaluation functions, and run comparative experiments to benchmark prompt or retrieval changes directly in the Prompt IDE.
Agent Integration (MCP)
npx skills add Arize-ai/phoenix
Connects coding agents to Phoenix so automations can instrument, query, and evaluate agents programmatically.
