15 Repos
Systems for executing and optimizing data retrieval queries.
Distinguishing note: Focuses on server-side execution logic rather than database-level indexing.
Explore 15 awesome GitHub repositories matching data & databases · Query Processing. Refine with filters or upvote what's useful.
Cheat.sh is a command line knowledge base that provides instant access to programming syntax, code snippets, and technical documentation. Designed to minimize context switching, it functions as a developer productivity tool that allows users to retrieve information directly within their terminal or code editor. The service distinguishes itself through a terminal-agnostic interface that relies on standard input and output streams, ensuring compatibility across various shell environments and operating systems. It supports persistent query sessions to maintain workflow continuity and offers a co
Executes search logic and content formatting on the host machine to minimize client-side requirements.
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Processes data streams and batches using a language-integrated API for selections, filters, and joins.
Chat2DB is an AI-powered SQL client and multi-database GUI manager designed for managing various relational and NoSQL database systems. It serves as a visual database management tool and a natural language to SQL interface, allowing users to convert plain text descriptions into executable and optimized queries. The platform distinguishes itself through automated business intelligence capabilities, which include the generation of real-time data visualization dashboards and AI-driven data analysis from spreadsheets. To ensure data privacy, it supports secure local AI deployment, enabling large
Executes and optimizes complex queries against massive datasets for enterprise-scale environments.
localGPT is a private AI knowledge base and retrieval-augmented generation application. It provides a local document indexer, a hybrid search engine, and an inference interface to enable chatting with private documents and managing a self-hosted information repository without sending data to external servers. The system distinguishes itself through a dual-pass verification pipeline that ensures generated answers are grounded in retrieved sources, accompanied by explicit source attribution. It employs a hybrid retrieval approach combining semantic vector search with keyword matching and rerank
Breaks complex user requests into multiple sub-queries executed in parallel to synthesize a final comprehensive answer.
Vercel is a cloud platform for building, deploying, and scaling web applications. It provides a unified infrastructure that automates the build process by detecting project frameworks and distributing static and dynamic content through a global content delivery network. The platform executes application logic using serverless functions that scale automatically based on real-time traffic demand. The platform distinguishes itself through a centralized AI gateway that proxies requests to multiple model providers, enabling standardized authentication, observability, and cost tracking. It supports
Provides advanced conversational capabilities for handling complex, multi-step user queries.
Entity Framework Core is an object-relational mapper that enables developers to interact with database systems using strongly-typed code. It serves as a comprehensive data access framework, providing a unified interface for mapping application objects to relational and non-relational database schemas while managing the lifecycle of data operations through a central context. The project distinguishes itself through a provider-based architecture that decouples core data access logic from specific database engines, allowing for consistent interaction across diverse storage systems. It features a
Processes query results in application memory when server-side translation is unavailable or explicitly requested.
EdgeDB is a graph-relational database that combines a PostgreSQL backend with a graph-based schema and query language. It functions as an object-relational mapper and graph query engine, allowing data to be modeled as objects and links to align storage with modern programming language structures. The system features a composable query language designed to retrieve deeply nested or interconnected data without the use of manual SQL joins. It includes an integrated AI-driven data retrieval solution with built-in support for vector embeddings. The platform provides a schema migration tool for tr
Enables the retrieval and manipulation of deeply nested or interconnected data without complex joins.
Local Deep Research is an autonomous research system consisting of an LLM research agent, a local model orchestrator, and a multi-engine search aggregator. It is designed to execute deep research by decomposing complex questions into atomic facts and synthesizing cited reports from academic, technical, and private document sources. The system features an encrypted research workspace that ensures zero-knowledge privacy through isolated, per-user encrypted databases. It utilizes a local RAG knowledge base to index research sources into searchable vector stores, allowing for retrieval-augmented
Decomposes complex research questions into smaller, atomic sub-queries to enable targeted multi-engine searches.
Albert is a keyboard launcher that opens files, applications, and runs commands by typing search queries into a search bar. It functions as a keyboard-driven workflow tool, enabling users to navigate their file system, launch installed applications, and execute shell commands without touching a mouse. The launcher processes user input through a plugin-based modular architecture, where functionality is extended by dynamically loaded C++ and Python plugins. Queries are dispatched to all enabled handlers in parallel, with results merged and ranked by a combination of match quality and historical
Processes user input through registered query handlers that return relevant results from various sources.
This project is a reference implementation and application template for Retrieval-Augmented Generation (RAG). It integrates Azure OpenAI with Azure AI Search to enable conversational chat interfaces that provide grounded responses based on private enterprise data. The system is distinguished by its multimodal AI interface, allowing it to process and reason over combined text, image, and PDF content. It employs a hybrid search architecture that combines vector and keyword retrieval with semantic reranking to prioritize the most relevant documents for prompt augmentation. The project covers a
Decomposes complex user requests into targeted sub-queries to retrieve precise information from memory.
Vespa is a distributed search engine, vector database, and machine learning ranking engine. It serves as an AI search platform designed to handle large-scale document indexing and complex query processing across a cluster of nodes, combining keyword retrieval with high-dimensional embedding storage for semantic similarity search. The platform distinguishes itself by integrating machine learning models directly into the search pipeline to perform real-time inference and ranking. It converts these models into ranking expressions to score and order results based on relevance, while providing a s
Executes search query logic and dispatches results through a middleware layer managing the request-response cycle.
MindSearch is an LLM-based multi-agent search engine that decomposes complex user questions into targeted sub-queries and routes each to a specialized agent for parallel investigation. The system orchestrates multiple agents through a large language model, coordinating their tasks and interpreting search results to produce coherent answers from multiple sources. The project provides a configurable search backend interface that allows switching between Google, DuckDuckGo, Brave, and Bing search APIs by updating a configuration attribute. It includes a terminal-based debug interface for testing
Splits complex user questions into parallel sub-queries handled by specialized agents.
GraphQL-Ruby ist eine Ruby-Bibliothek zum Erstellen von GraphQL-APIs mit einem stark typisierten Schema und einer dedizierten Query-Execution-Engine. Sie bietet ein umfassendes Framework zum Mappen von Anwendungsobjekten auf ein formales Typsystem, was strukturiertes Datenabrufen durch definierte Resolver ermöglicht. Das Projekt zeichnet sich durch fortschrittliche Performance- und Bereitstellungsmechanismen aus, darunter einen Data Loader für Batching und Caching zur Vermeidung von N+1-Abfragemustern. Es unterstützt leistungsstarke Datenbereitstellung durch inkrementelles Response-Streaming, verzögerte Abfrageantworten und paralleles Datenabrufen mittels Fibers. Zudem bietet es native Unterstützung für Relay-Konventionen, einschließlich spezialisierter Helfer für Connections und Objektidentifikation. Die Bibliothek deckt ein breites Spektrum an API-Management ab, einschließlich fein abgestufter Zugriffskontrolle, Schema-Versionierung zur Wahrung der Abwärtskompatibilität und Echtzeit-Updates via Subscriptions. Sie enthält zudem Traffic-Management-Tools zum Schutz von Serverressourcen, wie z. B. die Begrenzung der Abfragekomplexität und Request-Rate-Limiting. Entwicklung und Observability werden durch AST-Analysewerkzeuge, Execution-Tracing und spezialisierte Test-Utilities zur Verifizierung von Batch-Loading unterstützt.
Processes fields across multiple objects in a single batch to reduce memory usage for large nested lists.
Zeebe ist eine Cloud-native Workflow-Engine und eine verteilte Zustandsmaschine, die für die Orchestrierung von Geschäftsprozessen unter Verwendung von BPMN- und DMN-Standards konzipiert wurde. Sie arbeitet als hochperformante gRPC-Workflow-Runtime, die komplexe Geschäftsprozesse durch eine partitionierte Event-Streaming-Architektur ausführt. Das System fungiert zudem als Orchestrator für Large-Language-Model-Agenten und koordiniert KI-Reasoning und Tool-Nutzung innerhalb deterministischer Geschäftsprozesse. Die Engine zeichnet sich durch ihr Peer-to-Peer-Broker-Networking und ein konsensbasiertes Datenreplikationsmodell aus, das hohe Verfügbarkeit und Fehlertoleranz sicherstellt. Sie setzt einen partitionierten Broker-Cluster ein, um horizontale Skalierbarkeit zu erreichen, und nutzt adaptives Request-Backpressure, um den eingehenden Befehlsfluss zu regulieren und Systemüberlastungen zu verhindern. Die Plattform deckt ein breites Spektrum operativer Funktionen ab, einschließlich Echtzeit-Ausführungsüberwachung mit Performance-Heatmaps, automatisierter Geschäftsentscheidungsfindung über Entscheidungstabellen und verteilter Task-Ausführung durch ein polling-basiertes Job-Worker-Modell. Sie bietet zudem Tools für Multi-Tenant-Ressourcenisolierung, identitätsbasierte Zugriffskontrolle und die Integration externer Web-APIs und serverloser Funktionen. Das System kann über verschiedene Umgebungen hinweg bereitgestellt werden, einschließlich Kubernetes und Docker, und wird über eine Kombination aus Kommandozeilenschnittstelle und programmatischer REST-API verwaltet.
Retrieves real-time process state and data via the cluster interface for monitoring and analytics.
Memary is a memory-augmented agent framework that stores and retrieves contextual information from a knowledge graph to personalize responses and maintain long-term memory across interactions. It automatically captures all agent interactions and stores them as structured memories without requiring explicit instrumentation, then injects top-ranked user entities and themes into the active context window to tailor agent responses dynamically. The framework distinguishes itself through a multi-retriever memory search that combines COLBERT reranking with recursive graph queries across databases, e
Splits user queries into sub-questions to retrieve more targeted information from memory stores.