18 repository-uri
Distributed platforms that provide full-text indexing, advanced filtering, and fast query capabilities for large datasets.
Explore 18 awesome GitHub repositories matching data & databases · Search Engines. Refine with filters or upvote what's useful.
Acest proiect este un director cuprinzător, curatoriat de comunitate, care organizează un peisaj vast de biblioteci, framework-uri și instrumente software Python. Servește drept bază de cunoștințe centralizată concepută pentru a facilita navigarea în ecosistem și a accelera descoperirea de către dezvoltatori pe parcursul întregului ciclu de viață al dezvoltării software. Directorul se distinge prin furnizarea unui index structurat de resurse categorisite pe domeniu tehnic, variind de la utilitare fundamentale de dezvoltare la domenii de inginerie specializate. Acoperă capabilități de nivel înalt, inclusiv inteligență artificială, știința datelor, dezvoltare web și gestionarea infrastructurii, permițând dezvoltatorilor să identifice soluții verificate pentru provocări tehnice specifice. Proiectul cuprinde o suprafață largă de capabilități, inclusiv instrumente pentru gestionarea dependențelor, analiza statică a codului și testarea automatizată. De asemenea, cataloghează resurse pentru stocarea persistentă a datelor, orchestrarea infrastructurii cloud și dezvoltarea interfețelor, oferind o referință unificată pentru construirea și menținerea sistemelor software complexe.
Enable fast, relevant query results across datasets through high-performance indexing and full-text search capabilities.
This project is an enterprise-grade Java framework designed for building scalable, full-stack e-commerce applications. It provides a comprehensive foundation for microservice-based distributed architectures, enabling the development of complex retail platforms that include product management, order processing, and secure user authentication. By leveraging modular service patterns and centralized API gateways, the framework supports the construction of resilient systems that decompose monolithic business logic into independent, manageable services. The platform distinguishes itself through a r
Offloads complex query operations to a distributed cluster to provide high-performance full-text retrieval.
Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintaining global cluster state through a distributed consensus mechanism. The platform distinguishes itself through its integrated approach to observability, security, and advanced analytics. It combines full-text, vector, and hybrid search capabilities with machine learning-driven insi
Delivers high-performance full-text search capabilities with advanced relevance ranking and complex filtering on unstructured datasets.
This project serves as a comprehensive knowledge base and reference for distributed systems engineering and enterprise software architecture. It provides a structured collection of technical resources, design patterns, and methodologies intended to assist in the design, maintenance, and scaling of complex, high-performance software environments. The repository distinguishes itself by offering deep dives into core architectural concepts such as actor-based concurrency, aspect-oriented interception, and inversion-of-control containers. It emphasizes the practical application of distributed syst
Deploy enterprise-grade search platforms to provide advanced filtering, faceting, and relevance ranking for large-scale datasets.
This project is a community-driven library of structured text inputs designed to guide large language models into specific roles, behaviors, and operational modes. It functions as a comprehensive repository of prompt engineering resources, providing reusable templates that allow users to override default model tendencies and enforce domain-specific response patterns through instruction-following logic. The collection distinguishes itself by offering specialized persona-based directives that constrain model output to simulate professional experts or functional technical environments. By utiliz
Simulation prompts replicate search engine query syntax and indexing behaviors for testing and development purposes.
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
Delivers typo-tolerant text retrieval with advanced relevance ranking for high-performance search requirements.
Appwrite is a backend-as-a-service platform that provides a unified development environment for building full-stack applications. It integrates essential infrastructure components—including authentication, databases, storage, and serverless functions—into a single, centralized interface to simplify application development and resource management. The platform distinguishes itself through a container-based microservices architecture that ensures consistent execution across diverse infrastructure. It features a versatile connectivity layer that links frontend applications with third-party servi
Adds typo-tolerant search capabilities to application databases by syncing data with external search engines.
Reddit is a social news aggregator designed for hosting community-driven discussions and content sharing through threaded conversations and user-submitted links. It functions as a platform for managing large volumes of user-generated content, providing a structured interface for programmatic access to site data and core application functionality. The platform utilizes a REST API to expose site data and user interactions to external clients. To maintain performance across large datasets, it employs an external full-text search engine that offloads indexing and query processing from the primary
Provides high-performance indexing and querying for large volumes of user-generated content.
Tantivy is a library for building full-text search engines and indexing frameworks. It provides the core components necessary to organize large collections of text data into searchable structures, enabling the execution of complex queries and the retrieval of information across structured document sets. The engine utilizes an inverted index architecture to map terms to document identifiers, supported by a segment-based storage model that balances search performance with write throughput. It incorporates specialized data structures, including finite state transducers for term dictionaries and
Organizes text data into a searchable format based on a predefined schema to enable fast retrieval across large document collections.
Laradock is a collection of pre-configured Docker containers and orchestration definitions used to deploy multi-service development sandboxes. It functions as a PHP runtime manager and a Docker-based development environment, providing a set of modular service definitions for deploying web servers, databases, and caches through a single orchestration file. The project enables the creation of a local ecosystem featuring Nginx, MySQL, Redis, and Elasticsearch to mirror production infrastructure. It allows for switching between different versions of PHP and associated extensions, as well as manag
Launches pre-configured search containers providing full-text indexing and retrieval within a local environment.
ParadeDB is a database extension that integrates full-text search, vector database capabilities, and real-time analytics directly into a relational engine. It functions as a plugin that adds new storage and query execution capabilities to an existing database architecture. The project distinguishes itself by supporting hybrid search workflows that combine lexical keyword matching with dense and sparse vector similarity in a single query. It utilizes reciprocal rank fusion to merge these ranked result sets and employs logical replication to synchronize data from external instances, removing th
Integrates advanced text search, highlighting, and ranking capabilities directly into a Postgres database.
Searchkick is an integration library and wrapper that connects application models to search engines such as Elasticsearch and OpenSearch. It functions as a search index synchronizer, automatically mirroring database records to a search server to enable full-text and vector retrieval. The project provides a high-level interface for implementing keyword search, semantic vector search, and hybrid search. It distinguishes itself through the ability to combine traditional keyword matching with vector embeddings using reranking and fusion techniques to improve precision. The library covers the end
Provides a high-level library for syncing application models to search engines to enable intelligent full-text searching.
This is a Backend as a Service SDK for Apple platforms, providing a collection of libraries that connect iOS and macOS applications to cloud databases, authentication services, and serverless infrastructure. It serves as a developer kit for integrating real-time data synchronization, file storage, and push notifications into native apps. The SDK is distinguished by its generative AI integration, which routes text and multimodal prompts between on-device models and cloud-hosted large language models. It further differentiates itself with a specialized app distribution tool for managing pre-rel
Provides integration to index application data into Algolia for typo-tolerant full-text search.
Paperless-ng este un sistem de gestionare a documentelor self-hosted conceput pentru a arhiva documentele fizice ca fișiere digitale căutabile. Acesta funcționează ca un server privat pentru scanarea, indexarea și organizarea unei biblioteci digitale de documente printr-o interfață web. Sistemul acționează ca o arhivă de fișiere criptată, utilizând un backend privacy guard pentru a securiza documentele stocate. Oferă decriptare automată on-the-fly în timpul procesului de descărcare pentru a se asigura că înregistrările arhivate rămân protejate în timp ce sunt stocate. Platforma încorporează recunoașterea optică a caracterelor (OCR) pentru a converti imaginile scanate și PDF-urile în text căutabil pentru indexarea bazei de date. De asemenea, suportă ingestia automată prin monitorizarea directoarelor locale sau de rețea pentru încărcări noi.
Uses optical character recognition to extract text from scans and images for full-text search indexing.
Acest proiect este un framework de dezvoltare rapidă a aplicațiilor pentru construirea de interfețe back-office și dashboard-uri în cadrul aplicațiilor Laravel. Acesta funcționează ca un toolkit UI de gestionare backend și un generator UI bazat pe schemă care randează panouri de administrare și formulare de date prin maparea logicii backend la componente frontend pre-definite. Framework-ul include un sistem de control al accesului bazat pe roluri pentru a restricționa funcțiile aplicației și datele în funcție de identitatea utilizatorului și rolurile atribuite. De asemenea, oferă o integrare de căutare full-text care utilizează drivere interschimbabile pentru a indexa și regăsi conținutul aplicației. Capabilitățile suplimentare acoperă încărcarea asincronă a conținutului pentru a eficientiza tranzițiile paginilor și un sistem de rutare a notificărilor multi-canal. Platforma oferă, de asemenea, instrumente pentru filtrarea și sortarea datelor bazate pe interogări pentru a gestiona seturi de date complexe în cadrul dashboard-urilor interne.
Integrates typo-tolerant search capabilities and interchangeable drivers into the application database.
Acest proiect este un ghid cuprinzător de dezvoltare front-end și un roadmap conceput pentru a ajuta inginerii să stăpânească abilitățile și standardele profesionale necesare pentru dezvoltarea web modernă. Servește ca referință tehnică pentru stăpânirea HTML, CSS și JavaScript, oferind căi de învățare structurate și o hartă a competențelor profesionale necesare pentru a trece de la un nivel de începător la cel de inginer web profesionist. Resursa funcționează ca un director categorizat și o privire de ansamblu asupra ecosistemului JavaScript. Cataloghează framework-uri, biblioteci și utilitare standard în industrie, oferind recomandări specifice pentru gestionarea stării, framework-uri CSS și generatoare de site-uri statice. Ghidul acoperă un spectru larg de capabilități de inginerie, inclusiv arhitectura UI, optimizarea performanței web și auditarea accesibilității. De asemenea, oferă îndrumări privind automatizarea build-ului, strategiile de deployment și selecția instrumentelor de dezvoltare pentru fluxuri de lucru profesionale.
Recommends integration components for adding high-performance, typo-tolerant search capabilities to applications.
fsearch is a high-performance desktop file search tool and filesystem indexing engine. It provides near-instant location of files and folders on a local filesystem by utilizing a background indexing system that monitors filesystem changes in real time. The utility distinguishes itself through advanced query capabilities, including support for boolean search logic using AND, OR, and NOT operators, as well as regular expression and wildcard filtering. It allows for precise result refinement using literal character handling and specific search modifiers such as case sensitivity and exact matches
Delivers immediate search results and term highlighting in real time as the user types.
Papra is a self-hosted document management system designed for digital archiving, organization, and retrieval. It serves as a centralized platform for storing files with a focus on security, providing an encrypted file archive using AES-256-GCM and a programmatic interface for managing documents and metadata via a REST API, SDK, and command line tools. The system distinguishes itself through an automated document ingestion engine that imports files via email forwarding, monitored folders, and webhook listeners. It further enhances discoverability by acting as an OCR document indexer, extracti
Indexes document contents using OCR and text extraction to enable high-precision full-text search.