30 open-source projects similar to pathwaycom/llm-app, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Llm App alternative.
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Fluvio is a distributed event streaming platform and cloud-native streaming engine designed for collecting, persisting, and replicating real-time data streams across a distributed cluster. It functions as a real-time data pipeline for building stateful workflows that ingest, enrich, and export data between external sources and sinks. The platform is distinguished by its use of WebAssembly to execute compiled modules for in-line data transformations and filtering. This allows for the execution of custom business logic to reshape information in motion without requiring a restart of the cluster.
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that all data processing and model inference remain within private, local environments to maintain data sovereignty. The system distinguishes itself through a modular agentic engine that allows for the definition of custom skills and external tool execution. By utilizing a multi-model
DSPy is a declarative programming framework designed for building complex language model applications. It treats model interactions as modular, composable programs, allowing developers to define task logic through typed class schemas rather than relying on manually written prompts. By organizing workflows into hierarchical, reusable Python objects, the framework enables the construction of sophisticated AI systems that manage state and execution flow independently. The framework distinguishes itself through an automated optimization engine that iteratively refines prompt instructions and few-
Verba is a retrieval-augmented generation interface and chatbot that uses Weaviate to provide factual answers based on private datasets. It functions as a vector database knowledge base, combining a hybrid search engine with an orchestration interface to connect various large language model providers and embedding services. The system differentiates itself through a RAG pipeline manager for adjusting text chunking rules and retrieval settings, alongside a 3D vector space visualization tool for analyzing the spatial organization and clustering of high-dimensional embeddings. It employs a modul
The open-source RAG platform: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.
Dify is an open-source platform for building, orchestrating, and deploying generative AI applications and autonomous agents. It provides a visual development environment that allows users to design complex, multi-step logic chains and conversational flows, which can then be published as APIs, web interfaces, or embedded widgets. The platform acts as a centralized infrastructure layer, managing model connections, prompt templates, and knowledge retrieval to support scalable AI-powered services. What distinguishes the platform is its focus on stateful application design and workflow orchestrati
Cognita is a retrieval augmented generation orchestration framework used to build pipelines that connect document stores and language models to provide grounded answers. It functions as a document ingestion pipeline and a vector database integrator, managing the process of loading, parsing, and indexing files into a searchable knowledge base. The system includes a language model gateway proxy that provides a unified API to interact with multiple different model providers. This routing layer decouples the application from specific vendors, allowing requests to be proxied through a provider-agn
Vanna is a Python framework designed to build conversational interfaces that translate natural language into executable database queries. It functions as an enterprise-grade toolkit that connects language models to relational databases, allowing users to retrieve information through conversational prompts rather than manual code. The system maintains context across interactions by utilizing vector databases to store historical query patterns and schema metadata. The framework distinguishes itself through a focus on security and schema-aware generation. It incorporates granular access control,
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
Haystack is an orchestration framework designed for building complex search and generative AI pipelines. It functions as an agentic workflow engine, enabling the construction of automated sequences that allow AI agents to perform multi-step reasoning and data analysis. The framework utilizes a modular, component-based architecture that connects processing steps into directed acyclic graphs. By employing a provider-agnostic integration layer, it decouples core logic from specific external AI services and vector databases, allowing for the flexible exchange of underlying technologies. This desi
Apache Beam is a distributed data pipeline framework and unified data processing model designed to handle both bounded batch data and unbounded real-time streams. It provides a system for building scalable, data-parallel workflows that operate across compute clusters using a single programming model. The framework utilizes a cross-runner pipeline abstraction that decouples the data processing logic from the underlying execution backend, allowing the same pipeline to run on different distributed compute engines. It supports multi-language pipeline development by translating high-level code fro
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Redisson is a Java client library for Redis and Valkey that provides a distributed data structure library, a distributed lock manager, and a distributed MapReduce framework. It enables application instances in a cluster to share state through thread-safe collections and objects. The project implements a JCache compliant caching layer for standardized data storage and retrieval. It also functions as a probabilistic data store, providing memory-efficient structures such as Bloom filters and HyperLogLog for high-volume data membership testing. The library covers distributed state management usi
Datasets is a library designed for the management, processing, and sharing of large-scale data collections for machine learning workflows. It functions as both a data processing framework and a versioning platform, providing tools to organize, filter, and transform massive datasets while ensuring reproducibility across research and development teams. The library distinguishes itself by enabling the handling of datasets that exceed available system memory. It utilizes memory-mapped file access, disk-based caching, and lazy iterative streaming to maintain performance when working with large-sca
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
Pentaho Kettle is an enterprise ETL data integration platform designed to extract, transform, and load data between disparate sources and target databases. It functions as a metadata-driven orchestrator that utilizes a visual workflow designer to create and manage complex sequences of data tasks and transformation pipelines. The system is distinguished by its distributed data processing engine, which executes workloads across clusters of server nodes to increase throughput. It employs a plugin-based architecture, allowing the platform to be extended via external JAR files to provide connectiv
Superlinked is a development framework designed for building semantic search and retrieval pipelines. It functions as a machine learning data pipeline and semantic retrieval engine, providing the tools necessary to unify data schema definition, embedding generation, and vector database integration within a single application. The framework distinguishes itself by acting as a vector database orchestrator that manages the lifecycle of machine learning models alongside complex search logic. It enables developers to construct structured data models that map raw content and metadata into unified r
Orleans is a .NET distributed actor framework designed for building scalable, cloud-native applications. It implements a virtual actor model where entities with stable identities manage their own state and lifecycle across a cluster of servers. The framework provides a distributed state management system with ACID transaction support and a distributed pub/sub streaming engine for real-time data processing. It distinguishes itself through location-transparent routing, automatic actor activation and deactivation, and elastic cluster scaling that redistributes workloads during node failures. Th
Faust is a Python library for building distributed stream processing applications that integrate with Kafka. It functions as an asynchronous stream processor designed to handle high-throughput event streams and real-time data analysis using asynchronous functions. The system operates as a distributed stream processor and state store, utilizing sharding and partitioned topics to scale processing workloads horizontally across multiple worker nodes. It maintains state through a replicated key-value storage system backed by local databases to ensure high availability and fast recovery. The frame
Seata is a distributed transaction coordinator and consistency framework designed to maintain data integrity across multiple microservices. It functions as a manager that synchronizes state across separate databases to ensure atomic commits or rollbacks of global transactions. The project provides a toolkit for implementing distributed transaction patterns, using a two-phase commit protocol and centralized status tracking to prevent data anomalies. It orchestrates eventual consistency through state-machine-based tracking and message-driven coordination to handle timeouts and failures in distr
EOS is a Layer 1 blockchain infrastructure and high-throughput transaction engine. It serves as a WebAssembly smart contract platform that manages state transitions and network consensus across a peer-to-peer network. The system utilizes a sandboxed virtual machine for executing smart contract logic and employs a Byzantine Fault Tolerant delegated proof-of-stake consensus mechanism to finalize the global ledger state. It features a resource-based stake model for CPU and memory allocation and an asynchronous messaging system for inter-contract communication to prevent recursive call overflows.
Akka is an actor model framework and distributed systems platform used to build concurrent and distributed applications. It provides a toolkit for managing multi-threaded state and behavior through asynchronous message passing, allowing developers to create concurrent applications without manual locks or synchronization. The system functions as a cluster management and event sourcing framework, automating the scaling and coordination of high-availability clusters. It enables the deployment of elastic services that coordinate workloads across multiple network nodes and ensures fault tolerance
Materialize is a streaming SQL database that continuously ingests live data from sources such as Kafka, Redpanda, PostgreSQL, and MySQL, and incrementally maintains materialized views. It provides a PostgreSQL-compatible query engine that accepts standard SQL over the PostgreSQL wire protocol, enabling any existing SQL client or BI tool to query real-time data. The system also includes a Model Context Protocol (MCP) server that exposes live materialized view data to AI agents, providing fresh context without polling. Materialize distinguishes itself through its ability to offer configurable c
Rivet is a distributed infrastructure for managing the lifecycle, addressing, and persistence of stateful actors and durable execution engines. It provides a distributed process sandbox that executes application logic within lightweight isolates, ensuring resource isolation and fast cold starts. The system is designed to coordinate multi-step operations using persistent queues and timers to guarantee reliable task completion across distributed environments. The platform specifically enables the orchestration of stateful AI agents that maintain persistent memory and state across long-running i
Akka.NET is an actor model framework used for building concurrent and distributed applications. It functions as a distributed computing platform and state manager that enables isolated actors to communicate via asynchronous message passing, ensuring thread-safe state management without manual locks. The project is distinguished by its decentralized coordination capabilities, including a distributed state manager that uses sharding and dynamic rebalancing to maintain high availability. It incorporates an event sourcing engine that persists state as a sequence of events in an append-only log an
This project is a Substrate-based blockchain protocol implementation and a modular Rust blockchain runtime. It provides a framework for executing a distributed ledger protocol to maintain a synchronized network of nodes. The system features a peer-to-peer node implementation that manages decentralized communication and local key-value database storage. It includes a BFT-based consensus engine for coordinating block authoring and finality, as well as a WebAssembly-based runtime for isolated execution of state transitions. The protocol organizes domain-specific logic through a modular pallet a
Pipecat is a framework and software development kit for building real-time multimodal AI agents and speech-to-speech systems. It utilizes a frame-based data pipeline to route audio, video, and text through a modular sequence of processors, enabling the orchestration of low-latency conversational AI. The project is distinguished by its ability to coordinate complex multimodal services, including speech-to-text, language models, and text-to-speech, within a single pipeline. It features semantic voice activity detection for natural turn-taking, state-machine conversation flows for dialogue manag
node-redis is a Node.js client and database driver for interacting with Redis key-value stores. It functions as a connection manager and data iterator, allowing applications to execute database commands and manage atomic transactions on a remote server using asynchronous JavaScript. The project provides specialized capabilities for handling large datasets through an asynchronous interface and supports connection pooling to isolate blocking operations. It includes support for client-side caching and token-based authentication via Entra ID. The driver covers a broad range of operational areas,