18 repositorios
Distributed platforms that provide full-text indexing, advanced filtering, and fast query capabilities for large datasets.
Explore 18 awesome GitHub repositories matching data & databases · Search Engines. Refine with filters or upvote what's useful.
Este proyecto es un directorio integral curado por la comunidad que organiza un vasto panorama de bibliotecas, frameworks y herramientas de software de Python. Sirve como una base de conocimientos centralizada diseñada para facilitar la navegación del ecosistema y acelerar el descubrimiento de desarrolladores en todo el ciclo de vida del desarrollo de software. El directorio se distingue por proporcionar un índice estructurado de recursos categorizados por dominio técnico, que van desde utilidades de desarrollo fundamentales hasta campos de ingeniería especializados. Cubre capacidades de alto nivel que incluyen inteligencia artificial, ciencia de datos, desarrollo web y gestión de infraestructura, lo que permite a los desarrolladores identificar soluciones verificadas para desafíos técnicos específicos. El proyecto abarca una amplia superficie de capacidades, incluyendo herramientas para la gestión de dependencias, análisis de código estático y pruebas automatizadas. También cataloga recursos para el almacenamiento de datos persistentes, orquestación de infraestructura en la nube y desarrollo de interfaces, proporcionando una referencia unificada para construir y mantener sistemas de software complejos.
Enable fast, relevant query results across datasets through high-performance indexing and full-text search capabilities.
This project is an enterprise-grade Java framework designed for building scalable, full-stack e-commerce applications. It provides a comprehensive foundation for microservice-based distributed architectures, enabling the development of complex retail platforms that include product management, order processing, and secure user authentication. By leveraging modular service patterns and centralized API gateways, the framework supports the construction of resilient systems that decompose monolithic business logic into independent, manageable services. The platform distinguishes itself through a r
Offloads complex query operations to a distributed cluster to provide high-performance full-text retrieval.
Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintaining global cluster state through a distributed consensus mechanism. The platform distinguishes itself through its integrated approach to observability, security, and advanced analytics. It combines full-text, vector, and hybrid search capabilities with machine learning-driven insi
Delivers high-performance full-text search capabilities with advanced relevance ranking and complex filtering on unstructured datasets.
This project serves as a comprehensive knowledge base and reference for distributed systems engineering and enterprise software architecture. It provides a structured collection of technical resources, design patterns, and methodologies intended to assist in the design, maintenance, and scaling of complex, high-performance software environments. The repository distinguishes itself by offering deep dives into core architectural concepts such as actor-based concurrency, aspect-oriented interception, and inversion-of-control containers. It emphasizes the practical application of distributed syst
Deploy enterprise-grade search platforms to provide advanced filtering, faceting, and relevance ranking for large-scale datasets.
This project is a community-driven library of structured text inputs designed to guide large language models into specific roles, behaviors, and operational modes. It functions as a comprehensive repository of prompt engineering resources, providing reusable templates that allow users to override default model tendencies and enforce domain-specific response patterns through instruction-following logic. The collection distinguishes itself by offering specialized persona-based directives that constrain model output to simulate professional experts or functional technical environments. By utiliz
Simulation prompts replicate search engine query syntax and indexing behaviors for testing and development purposes.
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
Delivers typo-tolerant text retrieval with advanced relevance ranking for high-performance search requirements.
Appwrite is a backend-as-a-service platform that provides a unified development environment for building full-stack applications. It integrates essential infrastructure components—including authentication, databases, storage, and serverless functions—into a single, centralized interface to simplify application development and resource management. The platform distinguishes itself through a container-based microservices architecture that ensures consistent execution across diverse infrastructure. It features a versatile connectivity layer that links frontend applications with third-party servi
Adds typo-tolerant search capabilities to application databases by syncing data with external search engines.
Reddit is a social news aggregator designed for hosting community-driven discussions and content sharing through threaded conversations and user-submitted links. It functions as a platform for managing large volumes of user-generated content, providing a structured interface for programmatic access to site data and core application functionality. The platform utilizes a REST API to expose site data and user interactions to external clients. To maintain performance across large datasets, it employs an external full-text search engine that offloads indexing and query processing from the primary
Provides high-performance indexing and querying for large volumes of user-generated content.
Tantivy is a library for building full-text search engines and indexing frameworks. It provides the core components necessary to organize large collections of text data into searchable structures, enabling the execution of complex queries and the retrieval of information across structured document sets. The engine utilizes an inverted index architecture to map terms to document identifiers, supported by a segment-based storage model that balances search performance with write throughput. It incorporates specialized data structures, including finite state transducers for term dictionaries and
Organizes text data into a searchable format based on a predefined schema to enable fast retrieval across large document collections.
Laradock is a collection of pre-configured Docker containers and orchestration definitions used to deploy multi-service development sandboxes. It functions as a PHP runtime manager and a Docker-based development environment, providing a set of modular service definitions for deploying web servers, databases, and caches through a single orchestration file. The project enables the creation of a local ecosystem featuring Nginx, MySQL, Redis, and Elasticsearch to mirror production infrastructure. It allows for switching between different versions of PHP and associated extensions, as well as manag
Launches pre-configured search containers providing full-text indexing and retrieval within a local environment.
ParadeDB is a database extension that integrates full-text search, vector database capabilities, and real-time analytics directly into a relational engine. It functions as a plugin that adds new storage and query execution capabilities to an existing database architecture. The project distinguishes itself by supporting hybrid search workflows that combine lexical keyword matching with dense and sparse vector similarity in a single query. It utilizes reciprocal rank fusion to merge these ranked result sets and employs logical replication to synchronize data from external instances, removing th
Integrates advanced text search, highlighting, and ranking capabilities directly into a Postgres database.
Searchkick is an integration library and wrapper that connects application models to search engines such as Elasticsearch and OpenSearch. It functions as a search index synchronizer, automatically mirroring database records to a search server to enable full-text and vector retrieval. The project provides a high-level interface for implementing keyword search, semantic vector search, and hybrid search. It distinguishes itself through the ability to combine traditional keyword matching with vector embeddings using reranking and fusion techniques to improve precision. The library covers the end
Provides a high-level library for syncing application models to search engines to enable intelligent full-text searching.
This is a Backend as a Service SDK for Apple platforms, providing a collection of libraries that connect iOS and macOS applications to cloud databases, authentication services, and serverless infrastructure. It serves as a developer kit for integrating real-time data synchronization, file storage, and push notifications into native apps. The SDK is distinguished by its generative AI integration, which routes text and multimodal prompts between on-device models and cloud-hosted large language models. It further differentiates itself with a specialized app distribution tool for managing pre-rel
Provides integration to index application data into Algolia for typo-tolerant full-text search.
Paperless-ng es un sistema de gestión documental autohospedado diseñado para archivar papeleo físico como archivos digitales buscables. Funciona como un servidor privado para escanear, indexar y organizar una biblioteca digital de documentos a través de una interfaz web. El sistema actúa como un archivo de archivos cifrado, utilizando un backend de privacidad para asegurar los documentos almacenados. Proporciona descifrado automático sobre la marcha durante el proceso de descarga para asegurar que los registros archivados permanezcan protegidos mientras están en almacenamiento. La plataforma incorpora reconocimiento óptico de caracteres para convertir imágenes escaneadas y PDFs en texto buscable para la indexación de bases de datos. También admite la ingesta automática mediante el monitoreo de directorios locales o de red para nuevas cargas.
Uses optical character recognition to extract text from scans and images for full-text search indexing.
Este proyecto es un framework de desarrollo rápido de aplicaciones para construir interfaces de back-office y dashboards dentro de aplicaciones Laravel. Funciona como un toolkit de UI de gestión de backend y un generador de UI basado en esquemas que renderiza paneles de administración y formularios de datos mapeando la lógica de backend a componentes de frontend predefinidos. El framework incluye un sistema de control de acceso basado en roles para restringir las funciones de la aplicación y los datos según la identidad del usuario y los roles asignados. También proporciona una integración de búsqueda de texto completo que utiliza controladores intercambiables para indexar y recuperar el contenido de la aplicación. Las capacidades adicionales cubren la carga de contenido asíncrona para agilizar las transiciones de página y un sistema de enrutamiento de notificaciones multicanal. La plataforma también proporciona herramientas para el filtrado y ordenamiento de datos basado en consultas para gestionar datasets complejos dentro de dashboards internos.
Integrates typo-tolerant search capabilities and interchangeable drivers into the application database.
Este proyecto es una guía completa de desarrollo front-end y hoja de ruta diseñada para ayudar a los ingenieros a dominar las habilidades y estándares profesionales requeridos para el desarrollo web moderno. Sirve como referencia técnica para dominar HTML, CSS y JavaScript, proporcionando rutas de aprendizaje estructuradas y un mapa de las competencias profesionales necesarias para pasar de principiante a ingeniero web profesional. El recurso funciona como un directorio categorizado y una visión general del ecosistema de JavaScript. Cataloga frameworks, bibliotecas y utilidades estándar de la industria, ofreciendo recomendaciones específicas para la gestión de estado, frameworks CSS y generadores de sitios estáticos. La guía cubre un amplio espectro de capacidades de ingeniería, incluyendo arquitectura de UI, optimización del rendimiento web y auditoría de accesibilidad. También proporciona orientación sobre automatización de compilación, estrategias de despliegue y la selección de herramientas de desarrollo para flujos de trabajo profesionales.
Recommends integration components for adding high-performance, typo-tolerant search capabilities to applications.
fsearch is a high-performance desktop file search tool and filesystem indexing engine. It provides near-instant location of files and folders on a local filesystem by utilizing a background indexing system that monitors filesystem changes in real time. The utility distinguishes itself through advanced query capabilities, including support for boolean search logic using AND, OR, and NOT operators, as well as regular expression and wildcard filtering. It allows for precise result refinement using literal character handling and specific search modifiers such as case sensitivity and exact matches
Delivers immediate search results and term highlighting in real time as the user types.
Papra is a self-hosted document management system designed for digital archiving, organization, and retrieval. It serves as a centralized platform for storing files with a focus on security, providing an encrypted file archive using AES-256-GCM and a programmatic interface for managing documents and metadata via a REST API, SDK, and command line tools. The system distinguishes itself through an automated document ingestion engine that imports files via email forwarding, monitored folders, and webhook listeners. It further enhances discoverability by acting as an OCR document indexer, extracti
Indexes document contents using OCR and text extraction to enable high-precision full-text search.