30 open-source projects similar to basedhardware/omi, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Omi alternative.
zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors. The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for nativ
This project is a comprehensive framework for building AI-powered applications, providing a unified toolkit for orchestrating language models, autonomous agents, and interactive user interfaces. It serves as a central library for managing the entire lifecycle of AI interactions, from initial prompt generation and model provider abstraction to complex, multi-step reasoning and tool execution. The framework distinguishes itself through its deep integration with frontend development, specifically by enabling generative user interfaces that render dynamic components directly from model outputs. I
This project is an AI meeting assistant and interview copilot that monitors system audio and screen content to generate real-time responses during video calls. It functions as a system audio transcription tool and a context-aware prompt manager, injecting user documents and behavioral profiles into large language model prompts to tailor AI outputs. The system features a stealth screen overlay, utilizing a transparent window that displays information on top of other applications while remaining invisible to screen-sharing software and proctoring tools. It employs a process-hiding mechanism to
WhisperLiveKit is a real-time speech-to-text server that transcribes streaming audio into text with ultra-low latency using Whisper models. It serves transcription capabilities through REST endpoints and WebSocket connections, enabling external applications to send audio and receive transcriptions as words are spoken, making it suitable for live captioning or voice interfaces. The project distinguishes itself by combining real-time transcription with speaker diarization, assigning transcribed words to individual speakers during live audio streams for meeting or interview transcripts. It also
langchaingo is an LLM application framework for Go designed for building language model-powered applications and autonomous agents. It serves as an orchestration library and tool integration framework that allows developers to link prompt sequences and model calls into complex, multi-step workflows. The project provides a toolkit for implementing retrieval-augmented generation pipelines by processing unstructured documents and retrieving relevant context via vector search. It includes a dedicated integration layer for indexing high-dimensional embeddings and performing similarity searches acr
Vocode-core is a framework for building real-time conversational AI voice agents. It serves as a conversational orchestrator and pipeline that integrates speech-to-text, large language models, and text-to-speech services to enable low-latency voice interactions. The project features a provider-agnostic interface that allows for swappable speech and language model providers, including support for both cloud APIs and local binaries. It distinguishes itself through a specialized telephony integration layer that enables agents to be deployed across phone lines, WebRTC, and virtual meeting platfor
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
supabase-js is a comprehensive client library designed to integrate frontend applications with a hosted backend-as-a-service. It provides a unified interface for interacting with a PostgreSQL database, identity management systems, cloud object storage, and real-time data synchronization. The library features an isomorphic client design that operates across both browser and server environments. It distinguishes itself through a type-safe approach, utilizing TypeScript to map database schemas directly to client-side definitions, and employs a PostgREST-based API to translate JavaScript calls in
AdalFlow is an autonomous AI agent framework and LLM application library designed for building modular workflows. It serves as a model-agnostic interface and RAG pipeline orchestrator, allowing users to develop ReAct agents that utilize iterative reasoning and external tool execution to solve complex tasks. The project distinguishes itself through a prompt optimization system that uses textual gradient descent to automatically refine prompt templates and few-shot examples. It treats model feedback as a differentiable signal, enabling a form of LLM backpropagation to iteratively improve output
RealtimeSTT is a local speech-to-text engine and real-time automatic speech recognition server. It utilizes transformer-based recognition and omnilingual pipelines to convert live audio streams into text, providing a WebSocket-based streaming API for raw PCM audio transmission. The project is distinguished by a dual-backend transcription pipeline that uses a lightweight engine for immediate partial suggestions and a heavier model for final high-accuracy results. It includes a wake word detection system to trigger recording and employs a shared-resource inference model to distribute heavy spee
Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks. The platform distinguishes itself through advanced retrieval techniques, including support for h
Agent Squad is an LLM multi-agent orchestration framework designed to coordinate specialized agents to solve complex tasks. It functions as a system for managing agent teams and supervisors, utilizing a supervisor-led orchestration model to decompose large problems into manageable steps. The framework distinguishes itself through a combination of intent-based query routing and human-in-the-loop automation. It employs a hierarchical routing system to direct requests to the most appropriate agent or model, while integrating asynchronous messaging queues to route complex cases to human operators
This is an open-source Python SDK for building and orchestrating production-grade AI agents. It provides a unified framework for creating conversational agents that can use tools, maintain state, and coordinate across multiple language model providers including OpenAI, Anthropic, Google, Amazon Bedrock, and locally-hosted models. The SDK supports multi-agent orchestration through graphs, teams, and swarms, allowing several specialized agents to collaborate on complex tasks. Agents can be composed as callable tools that other agents invoke, and the framework includes policy handlers that inspe
Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification tasks to be executed locally on a device without requiring network access. The project is distinguished by its ability to perform zero-shot voice cloning and speaker diarization on-device. It supports a wide range of hardware accelerations, including GPU and various NPU architectures, and provides a Web
Vibe is a cross-platform transcription tool that converts spoken audio into text by running Whisper neural models directly on your device, with no cloud dependency. It can transcribe audio from files, microphones, system output, and network streams, and supports both batch processing of multiple files and real-time captioning from continuous input. Beyond basic transcription, Vibe identifies and labels different speakers through speaker diarization, and offers a choice of Command-Line Interface or HTTP API for automated and remote workflows. It also includes plugins to export transcripts to c
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention. The framework distinguishes itself through its focus on observability and secure, isolated execut
This project is a self-hosted meeting transcription and summarization tool that converts audio recordings into text transcripts and structured notes using large language models. It functions as an enterprise meeting documentation manager, allowing for the organization and editing of timestamped records. The system prioritizes data privacy through local-first processing and the ability to deploy on private infrastructure. It supports a provider-agnostic architecture, enabling users to connect to local AI engines, self-hosted servers, or cloud-based API endpoints for both transcription and summ
OmniRoute is a unified LLM API gateway that connects multiple AI providers to a single endpoint. Its primary purpose is to simplify the integration of various AI models into tools and agents by translating different provider formats into a standardized API. The project distinguishes itself through a multi-strategy request routing system that optimizes for cost, speed, and availability, including automatic model fallbacks and a circuit-breaker resilience model to isolate provider failures. It employs a local-first security posture, using AES-256-GCM encryption to store API keys and conversatio
This is an open-source platform for creating, hosting, and interacting with persistent AI characters that maintain personality and memory across conversations. The system orchestrates the full lifecycle of an AI companion by combining character definitions, conversation history, memory retrieval, model abstraction, and external communication channels into a unified runtime pipeline. The platform enables users to define detailed character personalities through structured configuration files that shape conversational behavior, and supports multi-turn dialogue through a memory system that stores
MIRIX is an AI agent state orchestrator and long-term memory system designed to provide persistent context for large language models. It functions as a multi-modal AI memory pipeline that processes text, voice, and screen captures into structured knowledge stores, including a dedicated screen activity knowledge base. The project distinguishes itself by integrating a multi-modal observation pipeline that monitors desktop activity in real-time to build a searchable history of user actions. It utilizes a multi-tiered memory hierarchy—separating episodic, semantic, procedural, and core stores—and
WhisperLive is a real-time speech-to-text server that converts live audio streams into text using Whisper models. It functions as a backend service that receives microphone input via WebSockets and provides incremental transcriptions with word-level timestamps. The system utilizes a GPU-accelerated inference engine and a keyword-boosted transcription API to improve the recognition accuracy of domain-specific jargon, acronyms, and product names. It also includes a speaker diarization tool that clusters audio embeddings to identify and label different participants within a recording. Additiona
This project provides a TypeScript software development kit for the Model Context Protocol, a standard designed to facilitate bidirectional communication between AI applications and external data sources or tools. It serves as a foundational framework for building both clients and servers, enabling language models to interact with external systems through a unified, decoupled interface. The SDK distinguishes itself by implementing a transport-agnostic connection layer that supports both local standard input-output streams and remote HTTP endpoints. It utilizes a JSON-RPC message bus to manage
Moonshine is a complete on-device voice interface toolkit that provides speech recognition, text-to-speech synthesis, phonetic processing, speaker diarization, and intent recognition, all running locally on edge hardware without any cloud dependency. It executes quantized neural networks for speech and language tasks directly on the device, enabling fully offline conversational AI capabilities. The toolkit distinguishes itself by orchestrating multi-turn spoken exchanges through a conversational flow manager that maintains context across interactions and manages branching dialog flows. It inc
This project is a Chinese automatic speech recognition framework and deep learning system designed to convert spoken Chinese audio into written text. It functions as a toolkit for training, evaluating, and deploying speech-to-text models, utilizing a specialized pinyin-to-text converter that transforms phonetic sequences into Chinese characters using a probability graph model. The system is distinguished by its deployment flexibility, offering a dockerized recognition server that provides transcription capabilities as a remote API. It supports high-performance streaming through a gRPC speech-