Langfuse | Awesome Repository

Langfuse is an open-source observability and evaluation platform designed for language model applications. It provides a centralized system for tracking execution traces, monitoring performance metrics, and managing prompt templates. By capturing hierarchical units of work and telemetry data, the platform enables developers to debug complex application lifecycles and analyze token usage, latency, and model interactions in production environments.

The platform distinguishes itself through an integrated evaluation framework that allows for systematic benchmarking and automated scoring of model outputs. Users can perform comparative experimentation by running multiple prompt or model versions side-by-side, and convert production traces into versioned test datasets to validate performance against ground truth. A dedicated prompt management system further decouples logic from application code, offering a playground for refinement and dynamic fetching of versioned templates.

Beyond core observability, the project supports a comprehensive suite of administrative and operational tools, including organizational access controls, identity provider integration, and automated workflow triggers. It is built for flexible deployment, supporting containerized orchestration in private, cloud, or Kubernetes-based environments to ensure data control and high-availability scaling.

The platform is designed for self-hosting and provides infrastructure-as-code templates to facilitate consistent environment setup. It integrates with standard observability ecosystems through open telemetry support and offers programmatic interfaces for headless management and automated deployment workflows.

Features

LLM Observability - Monitors and debugs language model applications by tracking prompts, completions, latency, and token usage.
AI Observability and Evaluation - Provides systematic experiments and automated scoring against datasets to validate model performance and output quality.
Prompt Registries - Decouples prompt logic from application code by serving versioned templates through a managed interface for dynamic retrieval.
Automated Trace Evaluation - Executes automated scoring and custom logic against captured traces to validate output quality against defined performance benchmarks.

Features

LLM Observability - Monitors and debugs language model applications by tracking prompts, completions, latency, and token usage.
AI Observability and Evaluation - Provides systematic experiments and automated scoring against datasets to validate model performance and output quality.
Prompt Registries - Decouples prompt logic from application code by serving versioned templates through a managed interface for dynamic retrieval.
Automated Trace Evaluation - Executes automated scoring and custom logic against captured traces to validate output quality against defined performance benchmarks.