Lmnr

Lmnr is an LLM observability platform and evaluation framework designed for tracing, logging, and monitoring language model executions. It provides the tools necessary to debug agent behavior, analyze performance, and identify failure patterns in AI agents.

The platform differentiates itself through a trace-to-dataset pipeline that converts production logs into labeled test sets for regression testing. It includes a prompt-variant replay engine to compare different prompts or models side-by-side and a state-cached debugging system to replay agent loops without restarting the process.

The system covers a broad range of capabilities, including event analysis via natural language extraction, SQL-based observability storage, and the creation of time-synchronized dashboards. It also manages AI datasets with versioning and annotation, provides real-time alerting through external integrations, and supports PII data redaction for privacy compliance.

The software is available as a self-hosted observability stack that can be deployed using container orchestration and cloud provider images.

Features

AI Observability Tracing - The project searches and visualizes traces in real time using full-text search and raw database access.

Agent Execution Traces - Captures and visualizes detailed execution traces of language model calls and tool usage.

Agent Interaction Dashboards - A visual interface for tracking agent execution flows, clustering recurring errors and alerting on system failures.

AI-Assisted Trace Analysis - Transforms unstructured logs into structured events using AI and SQL for deep trace analysis.

AI-Powered Observability Analysis - Uses language models to analyze observability telemetry and convert natural language into database queries for system insights.

Trace-to-Dataset Converters - Converts production execution logs into labeled test sets for regression testing and model fine-tuning.

Dataset Management - Manages pairs of inputs and expected outputs to facilitate model tuning and automated performance testing.

Event Analysis - Transforms unstructured trace data into structured events to trigger alerts and find failure patterns.

LLM Evaluation Frameworks - Implements a framework for measuring accuracy and detecting regressions via systematic experiments.

Model Performance Evaluators - Includes a framework for scoring model outputs against datasets to detect regressions.

Natural Language Entity Extraction - Transforms unstructured trace text into structured database records for alerting and analysis using language models.

Prompt Variant Experimentation - Executes historical trace inputs against different prompt variants or models to compare output quality side-by-side.

Production-to-Test Dataset Converters - Converts identified error clusters from production traces into regression test sets.

Agent Execution Tracing - Records model calls and tool usage to provide visual debugging of end-to-end agent reasoning.

AI Instrumentation Libraries - Uses an industry-standard SDK to instrument AI libraries and record application execution flow.

Trace Annotation - Assigns markers to specific traces or results to build training sets for fine-tuning and evaluation.

Agent Observability Platforms - Provides a complete platform for tracing, logging, and monitoring AI agent execution flows.

AI Agent Behavior Monitors - Provides tools to monitor AI agent behavior, detect failure patterns, and send real-time alerts.

SQL-Based Trace Queries - Stores observability traces and logs in a relational database to enable complex aggregations via standard SQL.

LLM Evaluation - Provides tools for measuring the quality of model outputs using custom metrics and automated judges.

CLI Evaluation Runners - Executes collections of evaluation files from a local directory via CLI for use in automated testing pipelines.

Remote Evaluation Execution - Executes performance tests locally or in pipelines with a UI for comparing result sets.

Data Visualization Dashboards - Arranges time-series charts and data tables to monitor signals across a shared time window.

Full Text Search - Provides full-text search across all inputs, outputs, and attributes within the execution trace data.

Metric Query Interfaces - Allows writing custom SQL expressions to calculate complex aggregations and metrics from traces.

Agent Flow Visualizations - Displays execution flow through transcripts and timelines that surface reasoning and sub-agent activity.

Log Event Clustering - Groups related error events into named categories to track frequency and resolution status.

Self-Hosted AI Platforms - Offers a self-hosted observability stack deployable via container orchestration on private infrastructure.

Agent State Debugging - Stores execution snapshots to allow developers to replay and fix agent loops without restarting the entire process.

Execution Logs - Logs calls to language models and custom functions to provide a detailed history of behavior.

Custom Metric Dashboards - Builds statistical tracking views and custom charts by executing queries against platform data.

Performance Visualization - Creates visual dashboards and charts based on custom database queries to monitor real-time performance.

OpenTelemetry Standard Integrations - Uses OpenTelemetry-compatible SDKs to capture application spans and events for interoperability with AI libraries.

Self-Hosted Monitoring Suites - Provides a full monitoring suite deployable on private infrastructure using container composition tools.

Model Evaluation and Benchmarking - Platform for tracing, evaluating, and analyzing LLM data.

Monitoring and Observability - Platform for tracing, evaluating, and labeling LLM products.

lmnr-ailmnr

Features

Star history