LangSmith vs Langfuse vs Phoenix: 2026 LLM Observability Comparison
Three observability platforms, three philosophies. LangSmith is proprietary with the deepest LangChain integration. Langfuse is open-source and vendor-neutral — the default for cost-conscious teams. Arize Phoenix is open-source core with a commercial tier, strong on ML-native evaluation. Here is how they compare across tracing, evals, pricing, and lock-in.
TL;DR — Which platform should you pick?
Pick Langfuse if you want open-source, self-hostable, vendor-neutral observability with no per-trace lock-in — it is the 2026 default for cost-conscious teams and has 23,000+ GitHub stars. Pick LangSmith if your stack is deeply LangChain-based and you want the tightest integration with LangGraph and LangChain agents. Pick Arize Phoenix if you want an open-source core with a path to commercial ML observability support and strong evaluation tooling.
Cost-conscious + self-host + vendor-neutral → Langfuse. LangChain-heavy stack → LangSmith. Open-source core + commercial option → Phoenix.
The three platforms at a glance
All three solve the same problem — trace LLM calls, evaluate outputs, monitor production — but with different licensing models and ecosystem strengths. The licensing difference is the single biggest decision factor: it determines whether you can self-host, whether you pay per trace, and how much vendor lock-in you accept.
| Platform | License | Self-host | Best for |
|---|---|---|---|
| LangSmith | Proprietary (free tier) | ✗ Managed only | LangChain-heavy stacks |
| Langfuse | MIT (open source) | ✓ Full | Cost-conscious, vendor-neutral |
| Arize Phoenix | Apache 2.0 core + commercial | ✓ Core only | ML-native teams, eval focus |
[IMAGE: Architecture diagram — LLM app → SDK instrumentation → observability platform (traces + evals + dashboards)]
Tracing capabilities
All three capture request-level traces as nested spans — the prompt, the model call, tool invocations, and retrieval steps that compose an agent turn. LangSmith has the deepest automatic instrumentation for LangChain and LangGraph: chains, agents, and tools are traced with near-zero config if you use those frameworks. Langfuse is framework-agnostic with SDKs for Python, JS, and OpenTelemetry integration — it requires a few lines of instrumentation but works with any stack. Phoenix also supports OpenTelemetry and has strong auto-instrumentation for popular frameworks (LlamaIndex, Haystack) via its integrations.
For agent workloads specifically, the Latitude 2026 comparison notes that tracing multi-turn agent conversations — where a single user turn triggers many tool calls and sub-agent spans — is where the platforms differentiate most. Langfuse's per-session cost visibility and LangSmith's deeper LangGraph integration each appeal to different teams.
Evaluation capabilities
Evaluation is where the three diverge in philosophy. All three support LLM-as-a-judge (using a strong model to score outputs), human annotation, and custom evaluators. Phoenix (via Arize) leans hardest into ML-native evaluation with built-in evaluators for hallucination, toxicity, and relevance, plus a strong evaluation dataset workflow. Langfuse offers flexible evaluation pipelines that integrate with your CI/CD and support both LLM-judge and heuristic evaluators. LangSmith integrates evaluation tightly with LangChain's test datasets and runs evals as part of LangChain pipelines.
For the methodology behind LLM-as-a-judge scoring — including its biases and how to validate judge reliability — see our LLM-as-a-Judge deep dive.
Pricing — where LangSmith's per-trace model bites
Pricing is the dimension that drives the most platform switching. Langfuse is free to self-host (MIT license) with no trace limits — you pay only for your own infrastructure. A managed cloud tier exists but the self-hosted version has feature parity. LangSmith started charging per-trace in 2024 — a Reddit r/LangChain thread from that period documents the cost-driven exodus to alternatives. For high-volume production tracing (millions of spans), LangSmith's costs scale with request count. Phoenix's core is free (Apache 2.0); Arize AX is a paid commercial tier for enterprise support and managed hosting.
The Pydantic 2026 pricing analysis makes the key point: the billing unit matters more than the headline price. Platforms that bill per-trace versus per-span versus per-token produce wildly different bills for the same workload. Always model your expected volume against each platform's billing unit before committing.
A team tracing 10 million spans/month faces very different economics on LangSmith (per-trace) vs Langfuse self-hosted (infrastructure cost only) vs Phoenix (open-source core). At scale, self-hosting usually wins on cost.
Open-source and self-hosting
Langfuse is the only one fully open-source and self-hostable with feature parity — the same code runs in their cloud and in your VPC. This matters for data residency, compliance, and avoiding vendor lock-in. Phoenix's core is open-source (Apache 2.0) but the full Arize AX feature set is commercial. LangSmith is proprietary — there is no self-host option, and all trace data transits LangChain's infrastructure.
For teams in regulated industries (healthcare, finance, EU GDPR), Langfuse's self-hosting is often the deciding factor. Our self-host Langfuse guide walks through a production Docker Compose deployment.
Feature comparison matrix
| Feature | LangSmith | Langfuse | Phoenix |
|---|---|---|---|
| Open source | ✗ Proprietary | ✓ MIT | ✓ Apache 2.0 core |
| Self-hostable | ✗ | ✓ Full | ✓ Core only |
| Per-trace pricing | Yes | Free (self-host) | Free (core) |
| LangChain auto-instrumentation | ✓ Deepest | ✓ | ✓ |
| OpenTelemetry support | Limited | ✓ | ✓ |
| LLM-as-a-judge evals | ✓ | ✓ | ✓ Strong |
| Human annotation | ✓ | ✓ | ✓ |
| Agent multi-span tracing | ✓ | ✓ | ✓ |
| Cost-per-request tracking | ✓ | ✓ Per-session | ✓ |
| Prompt management | ✓ | ✓ | Limited |
| GitHub stars (2026) | — | 23,000+ | 9,000+ |
When to pick Langfuse
Pick Langfuse when open-source, cost control, and vendor neutrality are the priority. It is the 2026 default for engineering teams that want full observability without per-trace bills or lock-in. The 23,000+ GitHub stars reflect a large community and rapid feature development. The trade-off is modest: you write a few lines of instrumentation (or use the OpenTelemetry integration) and optionally self-host the backend.
For a production self-host setup, see our self-host Langfuse tutorial.
When to pick LangSmith
Pick LangSmith when your application is deeply built on LangChain and LangGraph and you want zero-instrumentation tracing. LangSmith's auto-instrumentation for LangChain primitives (chains, agents, tools, retrievers) is unmatched, and the integration with LangChain's test datasets and evaluation pipelines is seamless. The cost is per-trace pricing and no self-host option — acceptable when your volume is moderate and your stack is LangChain-native.
When to pick Phoenix
Pick Phoenix (Arize) when you want open-source core with a commercial escalation path and you prioritize ML-native evaluation tooling. Phoenix's built-in evaluators for hallucination, relevance, and toxicity, plus strong evaluation dataset workflows, appeal to teams with an ML engineering bent. The open-source core lets you start free and move to Arize AX only if you need enterprise support.
FAQ
Is Langfuse really free?
Langfuse is open-source under the MIT license and free to self-host with no trace limits. A managed cloud tier exists for teams that want hosted infrastructure, but the self-hosted version has feature parity and no per-trace pricing.
Does LangSmith charge per trace?
Yes. LangSmith moved from a free tier to per-trace pricing in 2024. For high-volume production tracing, costs scale with request count, which is why many teams evaluate Langfuse or Phoenix as alternatives.
Is Arize Phoenix open source?
Yes. The core Phoenix project is open source (Apache 2.0). Arize AI offers a commercial Arize AX tier with additional enterprise features, support, and managed hosting on top of the open-source core.
Related deep dives
- How to Self-Host Langfuse — production Docker setup for the open-source option
- LLM-as-a-Judge Methodology — the evaluation technique all three platforms support
- Self-Host LiteLLM — pair gateway tracing with observability
Sources
- Kanerika, "LLMOps Observability: LangSmith vs Arize vs Langfuse vs W&B," 2026
- Maxim AI, "Choosing the Right AI Evaluation and Observability Platform," 2026
- Pydantic, "AI Observability Pricing Comparison: Logfire vs LangSmith vs Langfuse vs Arize," 2026
- ZenML, "Langfuse vs Phoenix: Which One's the Better Open-Source Option?," 2026
- Latitude, "Best LLM Observability Tools for AI Agents," 2026
- Birjob, "AI Agent Observability in 2026," 2026 (market size: $2.69B → $9.26B by 2030)
- Confident AI, "Top 5 Arize AI Alternatives and Competitors, Compared (2026)," 2026
Feature sets, pricing, and star counts change frequently. Verify against each platform's current docs and GitHub before committing.