8 个仓库
Capabilities for querying and narrowing down document sets based on criteria.
Distinguishing note: Focuses on the filtering logic applied to database queries.
Explore 8 awesome GitHub repositories matching data & databases · Document Filtering. Refine with filters or upvote what's useful.
Payload is a headless content management system and application framework that uses a code-first approach to define data schemas and administrative interfaces. By utilizing a centralized, type-safe configuration object, it automatically generates database schemas, API endpoints, and a fully customizable admin panel. The system is built on a database-agnostic architecture, allowing it to interface with various storage engines while providing a unified, type-safe API for server-side operations, REST, and GraphQL. What distinguishes Payload is its deep extensibility and developer-centric design.
Filters returned document fields to optimize database performance and reduce payload size.
NeDB is a JavaScript embedded NoSQL document store designed for Node.js and the browser. It functions as an in-memory data store with the option to persist documents to a local file system, ensuring data survives application restarts. The project utilizes a MongoDB-compatible API to perform data operations, allowing it to serve as a lightweight document indexing system and a persistent file database without requiring a separate database server. Capabilities include querying, inserting, updating, and deleting documents, as well as the ability to create indexes on specific fields to accelerate
Retrieves documents using equality, comparison, and logical operators to filter records.
TinaCMS is a headless content management framework that bridges local Git-based file storage with a visual, in-context editing interface. By treating your repository as the single source of truth, it enables developers to manage content as structured data files while providing editors with a browser-based dashboard to modify website content directly within a live preview. The framework distinguishes itself by transforming local files into a unified GraphQL API, which powers both the administrative interface and the application's data retrieval layer. This architecture allows for compile-time
Restricts selectable documents in reference fields based on property values to improve navigation in large datasets.
elasticsearch-dump is a command line tool for importing, exporting, and transferring data between Elasticsearch and OpenSearch instances. It functions as an index dump utility that saves documents, mappings, and analyzers to local files or standard output. The tool enables the movement of data between clusters using local files as an intermediary and can flatten nested JSON documents into CSV files for external analysis. It allows for the modification or anonymization of documents during the transfer process through the use of custom JavaScript functions. The utility covers data extraction a
Allows the use of search queries to filter and select specific subsets of documents for export.
AIOS is an LLM agent operating system and orchestration kernel designed to manage memory, resource scheduling, and tool execution for multiple autonomous AI agents. It serves as a comprehensive framework for developing and deploying agents, featuring a dedicated resource manager that coordinates model backends, GPU memory, and isolated kernel instances. The system distinguishes itself through a semantic memory engine that uses vector search and autonomous clustering for long-term knowledge management, and a semantic file system that allows users to control computer files and system operations
Searches file collections using text queries and keyword filters to retrieve relevant documents.
ExecuTorch is a lightweight C++ runtime for deploying PyTorch models on mobile, embedded, and edge hardware. It provides an ahead-of-time compilation pipeline that exports, quantizes, and lowers model graphs into compact serialized programs, then executes them through a minimal runtime with hardware acceleration and on-device large language model inference capabilities. The project distinguishes itself through a hardware accelerator delegate system that partitions model subgraphs and offloads computation to specialized backends including NPUs, GPUs, and DSPs from Apple, Arm, Intel, MediaTek,
Provides a utility to decode classification logits into top-1 labels for vision model outputs.
AdalFlow 是一个自主 AI 代理框架和 LLM 应用库,旨在构建模块化工作流。它作为一个模型无关的接口和 RAG 流水线编排器,允许用户开发 ReAct 代理,利用迭代推理和外部工具执行来解决复杂任务。 该项目通过一个提示词优化系统脱颖而出,该系统使用文本梯度下降自动优化提示词模板和少样本示例。它将模型反馈视为可微分信号,实现了一种 LLM 反向传播形式,从而根据评估指标迭代提高输出质量。 该框架涵盖了广泛的功能面,包括带有语义向量搜索和重排序的检索增强生成、用于可观测性的基于跨度的执行追踪,以及模式驱动的结构化解析。它为众多专有和开源模型提供商提供了统一的通信层,并支持将 Python 函数转换为标准化的工具接口。 该系统使用 Python 实现,并与 MLflow 集成以进行工作流跟踪和分析。
Restricts retrieved documents using SQL-like conditions or database-specific metadata filters.
Codesearch 是一个索引式代码搜索引擎和大规模源码索引器,旨在跨庞大的源代码树执行正则表达式。它作为一个在大型代码库中查找特定文本模式的工具,通过分析和索引海量源文件来实现快速检索。 该系统利用专门的三元组(trigram)搜索索引来加速复杂的正则表达式查询。这种索引方法在应用完整的正则表达式扫描之前,先通过三字符序列过滤候选文档,从而确保在大数据集上的高性能表现。 该引擎处理 UTF-8 和 Latin-1 编码内容的 Unicode 文本,确保不同语言间字符匹配和大小写的一致性。其功能涵盖源代码索引、模式匹配和高性能正则表达式搜索。
Identifies potential matches by executing regular expression queries against an optimized index to narrow document sets.