22 repos
Specialized tools for indexing, searching, and retrieving information across diverse data stores.
Explore 22 awesome GitHub repositories matching data & databases · Search and Indexing Technologies. Refine with filters or upvote what's useful.
This project is a centralized, open-access repository that serves as a structured directory for technical education and professional development. It functions as a community-driven knowledge base, aggregating high-quality learning materials to support global accessibility to computer science and software engineering re
Openclaw is a platform for managing agent execution environments, providing the infrastructure to control agent lifecycles, session state, and workspace persistence. It features a centralized gateway that handles model loops, tool invocation, and streaming events, while supporting multi-agent routing and persistent mem
AutoGPT is an orchestration platform designed for building, managing, and deploying autonomous agents. It provides a visual canvas-based environment where users can assemble agents by connecting modular blocks that represent actions, data flows, and conditional logic. The platform supports the entire agent lifecycle, i
Prompts.chat is a community-driven repository and management platform for AI prompts and agent skills. It provides a centralized interface for users to search, retrieve, and save prompts, while offering structured storage for multi-file agent skills that include documentation and supporting assets. The platform distin
HelloGitHub is a centralized discovery platform and technical knowledge repository designed to help developers identify high-quality open-source projects, libraries, and infrastructure. It functions as a structured directory that aggregates specialized development tools and educational materials, organizing them by tec
This project is a community-maintained directory of technical resources, tools, and services that offer free tiers for developers. It serves as a centralized reference point for discovering infrastructure, software, and educational materials, helping individuals and teams minimize operational costs while building and s
HowToCook is a structured culinary knowledge base and computational engine designed for the management and scaling of instructional cooking content. It provides a framework for organizing technical preparation procedures and ingredient data, allowing users to maintain consistent culinary standards across various meal s
This project provides an integrated backend platform built around a relational database. It automatically generates REST and GraphQL APIs from database schemas, allowing for direct data interaction through standard requests and client libraries. The platform includes a comprehensive authentication system that manages u
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi
This project is a comprehensive educational curriculum designed to build proficiency across modern infrastructure, cloud-native technologies, and systems administration. It functions as a reference library and interview preparation resource, offering a structured collection of conceptual questions, practical coding cha
This project is a comprehensive, community-sourced repository of technical interview questions and study materials. It serves as a centralized index for software engineers to prepare for technical assessments, benchmark their personal knowledge, and identify gaps in their expertise across a wide range of programming la
This project is a general-purpose command-line filter that provides an interactive interface for processing standard input streams. It enables real-time fuzzy searching, data selection, and transformation, allowing users to navigate complex information or file systems directly within their terminal. By utilizing a pipe
Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintainin
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr
This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, t
This project is a curated knowledge repository that aggregates high-quality resources, technical documentation, and expert insights focused on distributed systems engineering. It serves as a community-driven learning resource designed to help developers navigate the complexities of building and maintaining large-scale
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transfo