Developer Documentation

Phoenix is instrumented via OpenTelemetry and the OpenInference semantic conventions, making it compatible with any OTel-capable SDK or framework.

Quick Setup

pip install arize-phoenix
phoenix serve  # starts local instance at localhost:6006

Tracing

Instrument your LLM app using Phoenix's auto-instrumentation libraries or manually via OTel spans. Traces capture prompts, completions, retrieval chunks, tool calls, latency, and token counts.

Evaluation

Use the phoenix.evals module to run LLM-as-a-judge evaluations on your traces:

from phoenix.evals import llm_classify, RAG_RELEVANCY_PROMPT_TEMPLATE
results = llm_classify(dataframe=df, model=model, template=RAG_RELEVANCY_PROMPT_TEMPLATE)

Experiments

Create datasets from production traces, define evaluation functions, and run comparative experiments to benchmark prompt or retrieval changes directly in the Prompt IDE.

Agent Integration (MCP)

npx skills add Arize-ai/phoenix

Connects coding agents to Phoenix so automations can instrument, query, and evaluate agents programmatically.