30 open-source projects similar to livekit/livekit, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Livekit alternative.
This project is a framework for developing multimodal AI agents that function as programmable participants in real-time communication rooms. It enables the construction of agents that can see, hear, and speak by integrating speech-to-text, large language models, and text-to-speech pipelines to facilitate low-latency, natural conversations. The system is distinguished by its advanced orchestration of real-time media and conversational flow, including support for full-duplex speech, preemptive response generation, and sophisticated interruption management. It further differentiates itself throu
Pipecat is a framework and software development kit for building real-time multimodal AI agents and speech-to-speech systems. It utilizes a frame-based data pipeline to route audio, video, and text through a modular sequence of processors, enabling the orchestration of low-latency conversational AI. The project is distinguished by its ability to coordinate complex multimodal services, including speech-to-text, language models, and text-to-speech, within a single pipeline. It features semantic voice activity detection for natural turn-taking, state-machine conversation flows for dialogue manag
Vocode-core is a framework for building real-time conversational AI voice agents. It serves as a conversational orchestrator and pipeline that integrates speech-to-text, large language models, and text-to-speech services to enable low-latency voice interactions. The project features a provider-agnostic interface that allows for swappable speech and language model providers, including support for both cloud APIs and local binaries. It distinguishes itself through a specialized telephony integration layer that enables agents to be deployed across phone lines, WebRTC, and virtual meeting platfor
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention. The framework distinguishes itself through its focus on observability and secure, isolated execut
Kilocode is an autonomous engineering platform designed to orchestrate AI agents for complex software development tasks. It functions as a comprehensive system for automating coding, testing, and repository management by integrating directly with your codebase and terminal. The platform provides a unified gateway for model orchestration, allowing for the management of agentic workflows, event-driven automation, and persistent session state across distributed development environments. The platform distinguishes itself through its federated task management and policy-based access control, which
Duix-Mobile is a software development kit for deploying real-time conversational AI characters on mobile devices. It enables the creation of interactive digital humans capable of fluid voice-to-voice interactions, featuring low-latency speech recognition and synchronized lip movements. The project distinguishes itself through the ability to integrate custom external language models and speech providers to define an avatar's intelligence and voice. It supports the generation of real-time multilingual subtitles and provides mechanisms to track the training status of newly created digital charac
FastRTC is a Python framework for building low-latency, bidirectional audio and video streams. It serves as a real-time communication library that provides a wrapper for WebRTC media servers, allowing users to create streaming applications with integrated media handling. The project distinguishes itself by providing a gateway for telephony integration, which maps temporary phone numbers to streaming media endpoints. It also includes built-in voice activity detection to manage automatic turn-taking and speech boundary identification in real-time conversations. The library supports mounting me
This project is a community-curated directory of open-source software designed for deployment in private server environments and home labs. It serves as a comprehensive resource for discovering independent, self-hosted alternatives to mainstream cloud services, enabling users to maintain full data ownership and control over their digital infrastructure. The directory is structured through a hierarchical taxonomy that organizes a vast collection of applications into logical categories, ranging from media management and data analytics to private communication and team productivity tools. It dis
Zeroclaw is a modular framework for building and deploying autonomous agents that integrate AI models, messaging platforms, and hardware interfaces. It functions as a multi-agent orchestrator and embedded systems controller, providing a unified runtime for managing agent lifecycles, memory, and security policies across diverse environments. The system distinguishes itself through its focus on secure, verifiable hardware and software orchestration. It enforces strict security boundaries, including command allowlisting, resource throttling, and interactive human-in-the-loop approval for sensiti
This project serves as a curated directory and resource hub for developers working with generative artificial intelligence. It provides a comprehensive index of open-source software solutions, frameworks, and project examples designed to help users discover and implement advanced AI systems. The repository focuses on practical implementations of agentic, multimodal, and retrieval-augmented generation architectures. It highlights tools for building conversational assistants, voice-enabled agents, and automated workflows that leverage large language models. By showcasing diverse technical domai
Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification tasks to be executed locally on a device without requiring network access. The project is distinguished by its ability to perform zero-shot voice cloning and speaker diarization on-device. It supports a wide range of hardware accelerations, including GPU and various NPU architectures, and provides a Web
Fonoster is a conversational AI framework and multi-tenant communications platform as a service. It serves as a programmable voice gateway and SIP telephony platform, enabling the creation of voice-based assistants and automated communication workflows using large language models. The project distinguishes itself through a vendor-agnostic speech integration engine that abstracts speech-to-text and text-to-speech providers. It features a multi-tenant architecture that isolates telephony resources and user identities into distinct organizational workspaces. The system covers a broad range of t
This project is a comprehensive toolkit for on-device speech recognition, synthesis, and audio processing, specifically engineered for Apple Silicon. It provides a framework for building real-time, full-duplex voice agents that operate entirely offline, leveraging native hardware acceleration to maintain performance and privacy. By utilizing optimized machine learning models, the library enables local execution of complex audio tasks without reliance on external cloud services. The library distinguishes itself through its specialized focus on local, high-performance voice interaction. It incl
This project is a framework for building local voice assistants and a real-time audio streaming server. It functions as a containerized inference engine and a multilingual speech pipeline that orchestrates speech-to-text, language models, and text-to-speech components to convert spoken input into spoken output. The system is distinguished by its use of WebSocket-based bidirectional streaming for low-latency interactions. It features a voice activity detection system that manages speech boundaries and handles user barge-in interruptions during assistant playback. It also supports custom voice
Ten Framework is a multimodal large language model agent framework designed for building low-latency conversational agents. It integrates voice, text, and visual inputs in real time to facilitate human interaction. The project includes a real-time speech processing pipeline for streaming transcription, voice activity detection, and speaker diarization. It also features an avatar synchronization engine that coordinates character lip animations and visual outputs with synthesized speech. The framework covers edge AI deployment through containerized packaging and direct integration with embedde
This project is an autonomous agent framework designed to integrate large language models with popular messaging platforms. It functions as a middleware platform that enables automated, multimodal interactions by decomposing complex user goals into sequential plans, executing them through external tools, and maintaining persistent context across sessions. The framework distinguishes itself through a modular skill architecture and a hybrid memory system. Users can extend system capabilities by installing custom logic modules from community hubs or generating them through natural language. The
PraisonAI is an autonomous AI agent platform that coordinates multiple LLM-powered agents for research, planning, and execution of complex workflows. It functions as a multi-agent orchestration framework, a workflow builder, and a Model Context Protocol server, while also providing retrieval-augmented generation through vector knowledge bases. Agents can interact via CLI, web, or standardized protocols with sandboxed code execution. The platform distinguishes itself with a rich set of agent communication protocols, including A2A, REST, WebSocket, voice and telephony integration, and MCP, allo
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
Mediasoup is a selective forwarding unit used for real-time media routing. It enables the development of low-latency audio and video communication systems by routing streams between participants without transcoding. The project provides embedded media routing logic that can be integrated directly into an application. It supports simulcast and quality layering, allowing the system to adapt resolution and bitrate based on real-time bandwidth estimations to maintain connection stability. The capability surface includes media track management for audio, video, and screen capture, as well as bidi
BigBlueButton is an open-source virtual classroom software and meeting server designed for hosting real-time online teaching sessions. It functions as a WebRTC video conferencing platform and collaboration suite, providing a self-hosted environment for virtual classroom management and online course administration. The system distinguishes itself through specialized educational tools, including an interactive quiz engine, breakout room coordination, and live polling. It provides a comprehensive suite of collaborative assets such as a shared infinite whiteboard, real-time co-authored notes, and
Coturn is a network server that facilitates peer-to-peer media traffic for real-time communication applications. It functions as a relay platform for voice, video, and data transmission, enabling direct connections between clients located behind restrictive firewalls and network address translators. The server implements standard network traversal protocols to manage media packet exchange and client authentication. It utilizes a multi-threaded architecture and event-driven polling to handle high-throughput traffic, while employing hash-based message authentication codes to verify client ident
This is an open-source Python SDK for building and orchestrating production-grade AI agents. It provides a unified framework for creating conversational agents that can use tools, maintain state, and coordinate across multiple language model providers including OpenAI, Anthropic, Google, Amazon Bedrock, and locally-hosted models. The SDK supports multi-agent orchestration through graphs, teams, and swarms, allowing several specialized agents to collaborate on complex tasks. Agents can be composed as callable tools that other agents invoke, and the framework includes policy handlers that inspe
NemoClaw is an LLM agent orchestrator and sandboxed execution environment designed to deploy and manage the lifecycles of large language model agents. It provides a secure runtime that isolates persistent agents from the underlying host system to ensure operational security. The system includes a secure LLM inference gateway that acts as a managed routing layer, securing communication between AI agents and inference engines to prevent unauthorized access. It also integrates with NVIDIA OpenShell to run specialized agents within a secure shell environment. Operational control is provided thro
This project is a WebRTC screen sharing server designed to facilitate the streaming of desktop views between multiple participants. It functions as a signaling server to coordinate connection metadata and a relay server to ensure connectivity for users behind restrictive firewalls or symmetric NATs. The server enables real-time screen sharing by establishing direct peer-to-peer connections to reduce latency and server load. It utilizes a relay architecture to maintain stable communication when direct paths are blocked by network firewalls. The system provides coordination for session managem
Personaplex is an LLM speech-to-speech framework and conversational AI persona engine designed for real-time voice interfaces. It provides a system for defining AI identities and vocal characteristics through a combination of text-based role prompts and audio reference files. The project features a real-time AI voice interface that supports full-duplex human-AI dialogue, enabling multiple parties to speak and listen simultaneously via bidirectional audio streaming. It includes a GPU-accelerated audio processor and a speech-to-speech pipeline to facilitate low-latency conversations. The frame
SimpleWebRTC is a communication framework and real-time media streaming library designed to establish peer-to-peer audio, video, and data streams between web clients. It provides a conference room manager to organize multiple participants into virtual rooms for group interaction. The framework includes a dedicated system for peer-to-peer file transfers and low-latency data messaging. It also features a network traversal configuration tool for managing the servers required to maintain connectivity across firewalls and restrictive network environments. The project covers broad capability areas
Janus is a WebRTC media gateway that routes real-time audio, video, and data between web browsers and server-side application logic. It functions as a central media relay that manages session negotiation and encryption for multiple browser endpoints. The project utilizes a modular plugin architecture that decouples the core server from specific media logic, allowing developers to implement custom modules for media processing, event handling, and transport protocols. This design enables the server to act as a protocol translation bridge, converting WebRTC streams into legacy formats such as SI