27 个仓库
Systems for merging technical data from disparate sources like websites, repositories, and media into a unified structure.
Distinct from Multi-file Aggregators: Candidates focus on real-time telemetry streams or simple file globs; this is multi-modal technical content aggregation.
Explore 27 awesome GitHub repositories matching data & databases · Multi-Source Content Aggregation. Refine with filters or upvote what's useful.
Owl is a framework for agentic workflow automation and multi-agent orchestration. It functions as a system for coordinating autonomous large language model agents to decompose and execute complex tasks through shared communication and collaborative planning. The project distinguishes itself through a multi-modal toolset for processing images, audio, and video, alongside a synthetic data generator that produces domain-specific datasets using self-instruct and verifier loops. It further incorporates a retrieval-augmented generation pipeline framework that integrates long-term memory and real-ti
Ships a suite of tools for processing images, audio, and video files alongside structured document parsing.
WebAgent is an autonomous web navigation agent and research system designed to browse the internet and synthesize information to answer complex queries. It functions as a reasoning orchestrator that navigates the web iteratively to perform deep research and extract structured data. The project includes a reinforcement learning training pipeline that generates synthetic interaction datasets for model pre-training and fine-tuning. It employs token-level policy gradients to stabilize training in non-stationary environments and uses a dual-mode inference scaling mechanism to balance execution bet
Normalizes heterogeneous inputs from live web pages and local PDFs into a uniform representation for processing.
Skill Seekers is a toolset for generating large language model knowledge bases, featuring a multi-source content scraper and a dedicated RAG data pipeline. It extracts technical data from documentation, code, and video to create structured assets and configuration files for AI-powered IDE extensions. The project distinguishes itself through the ability to transform raw data into polished tutorials and specialized skills for AI plugin marketplaces. It utilizes abstract syntax tree parsing and optical character recognition to analyze GitHub repositories, PDFs, and video frames, converting these
Combines content from websites, repositories, and media files into a unified knowledge structure.
This project is a self-hosted RSS feed aggregator and reader designed to collect and organize content from RSS, Atom, and JSON feeds. It functions as a privacy-focused client that blocks pixel trackers and strips URL parameters to prevent third-party tracking and referrer leakage. The system is built as a REST API feed reader, exposing its data and user accounts through a programmable interface for third-party clients. It maintains compatibility with the OPML standard for importing and exporting subscriptions and provides tools for web content extraction using readability parsers and custom r
Collects and organizes content from Atom, RSS, and JSON sources into a unified interface.
PicaComic is a digital comic and manga reader that enables browsing and reading content from multiple online sources within a single unified interface. It aggregates data from various providers into a local database for consistent searching and browsing. The application supports custom content integration, allowing the registration of new third-party reading sources through a provider-based extension system. It also features cross-device reading synchronization to keep reading progress and favorite lists aligned across different devices. Additional capabilities include offline content manage
Aggregates comic data from multiple third-party sources into a single unified interface.
Gridsome is a Vue.js static site generator designed for building Jamstack websites. It functions as a progressive web app framework that pre-renders components into static HTML files for delivery via content delivery networks. The system includes a GraphQL data orchestrator that unifies content from multiple APIs and local files into a single schema for site queries. It also integrates a frontend asset optimizer to automatically compress images and implement code-splitting. The framework provides support for offline-capable websites through prefetching pages and critical asset loading. Addit
Combines content from various APIs and local files into a single interface to power a website's frontend.
CloudSaver is a multi-cloud file transfer manager and storage aggregator designed to discover remote resources and save them directly to cloud drives. It functions as a cloud file downloader and management platform that enables the movement of data between different cloud storage providers without requiring files to be downloaded to a local device first. The system uses OAuth authentication to manage secure connections to third-party cloud drives, facilitating direct server-to-server data transfers. It incorporates asynchronous streaming to move data between remote sources and destinations, p
Merges searchable file information from disparate cloud providers into a unified structure for cross-platform discovery.
Jazzy is a source code documentation tool and API generator designed for Swift and Objective-C. It analyzes project roots and compiled modules to produce searchable HTML websites or offline docsets. The system functions as a multi-module API documenter, aggregating documentation from separate source modules into a single site with cross-module linking. It serves as a markdown-based documentation engine that integrates technical guides and LaTeX mathematical equations to complement generated API references. The tool covers a broad capability surface including multi-language API generation for
Merges technical API data from disparate source modules into a unified structure with shared search.
Horizon 是一个 AI 驱动的新闻聚合系统,旨在构建自定义流水线,从多样化的 Web 来源获取、过滤和丰富信息。它利用大语言模型来自动化信息过滤,对内容进行评分以消除噪音并突出高价值报道。 该系统集成了模型上下文协议 (MCP),将流水线阶段作为外部 AI 助手的工具暴露出来。它采用统一适配器来标准化不同的 AI 模型提供商,以实现一致的内容评分和摘要任务。 该流水线从 RSS 订阅、社交平台、金融工具包和代码仓库中聚合数据。它通过去重、基于配额的类别过滤和上下文丰富来管理内容,然后通过电子邮件、Webhook 或静态站点部署提供多语言简报。 工作流通过循环云自动化进行编排,以管理已处理信息的定时收集和交付。
Aggregates technical content from diverse sources like RSS, social platforms, and repositories into a unified structure.
这是一个 Android 包管理器和应用商店客户端,旨在浏览、安装和更新来自 F-Droid 和自定义第三方仓库的开源软件。它作为一个开源仓库客户端,允许用户通过同步目录发现软件。 该系统具有本地优先的仓库缓存,使用户能够在没有活跃互联网连接的离线模式下搜索和管理其软件库。它支持多源目录管理,将来自多个仓库 URL 的应用数据聚合到一个索引中。 该客户端提供灵活的包安装路径,通过基于会话的提示、root 权限或通过 Shizuku 进行的专门权限提升来路由部署。它还包括自动后台更新轮询,以保持已安装应用的最新状态。
Aggregates application data from multiple custom and default repository URLs into a single unified index.
BibiGPT-v1 is an AI-powered media summarizer that generates concise summaries and enables interactive Q&A for audio and video content from multiple platforms. It uses large language models to process transcripts from sources like YouTube, Bilibili, and local files, delivering real-time streaming responses for an interactive chat experience. The project distinguishes itself by combining multi-platform content aggregation with a conversational learning assistant capability, allowing users to query audio and video content through AI-driven dialogue. It also includes export functionality for savi
Fetches and processes media from diverse sources like YouTube, Bilibili, and local files into a unified AI workflow.
DeepChat is a desktop application that connects to multiple cloud and local AI model providers through a single unified chat interface, while also integrating external ACP-compatible coding and task agents as selectable models. It manages local AI agent sessions with project folders, permission modes, and resumable context for long-running tasks, and connects external tools and data sources via the Model Context Protocol using StreamableHTTP, SSE, or Stdio transports. The application distinguishes itself by supporting remote desktop session control, binding messaging app channels to sessions
Displays Markdown, code blocks, images, Mermaid diagrams, and artifacts within conversations for diverse result presentation.
Podcastfy is an AI content-to-podcast generator that converts text, URLs, PDFs, images, and videos into conversational audio podcasts. It integrates with over 100 language models for transcript creation and multiple text-to-speech engines for audio output, with support for customizable dialogue style and optional local transcript generation for privacy. The project distinguishes itself through a flexible architecture that decouples job submission from result retrieval via asynchronous polling, normalizes heterogeneous inputs into uniform text, and routes content through pluggable LLM and TTS
Transforms heterogeneous inputs like text, URLs, images, and PDFs into a uniform text representation.
Returns images or media from tools, allowing the LLM to analyze visual content.
该项目是一个与框架无关的库,用于构建可访问的“即输即搜”(search-as-you-type)界面。它提供了一个无头(headless)逻辑层,将搜索状态管理和结果过滤与视觉呈现解耦,允许开发者完全控制底层的 HTML 结构和样式。 该库以高度模块化的架构脱颖而出,支持多源数据聚合,能够将来自静态数组、远程 API 和外部索引的结果组合到单个界面中。它具有一个与各种虚拟 DOM 库集成的灵活渲染引擎,以及一个基于插件的系统,用于扩展查询建议、最近搜索历史和自定义重定向等功能。 该系统涵盖了广泛的搜索功能,包括用于上下文感知答案的生成式 AI 集成、实时结果过滤和相关性调优。它包括用于跟踪用户交互和网络状态的内置可观测性工具,以及对 WAI-ARIA 可访问性标准的全面支持,以确保包容性的键盘和屏幕阅读器导航。 该库专为集成到多样化的 Web 环境中而设计,提供用于数据源、界面本地化和移动端特定优化的配置实用程序。
Aggregates search results from diverse sources like static arrays, remote APIs, and external indices into a single unified interface.
TAICHI-flet 是一个基于 Flet 构建的 AI 集成资源浏览器和 Windows 桌面应用。它作为一个集中式多媒体中心和 Web 内容聚合器,旨在将人工智能工具与搜索和访问电影、音乐及软件的工具相结合。 该应用支持从多个来源聚合资源,包括云存储驱动器和外部 Web 地址。它提供了用于流式传输和下载动漫与音乐、通过文本转语音播放阅读在线小说,以及利用人工智能自动化 Windows 操作系统操作的专用工具。 界面包含用于在内容类别之间切换的标签式导航系统,以及用于自定义桌面美观和壁纸的主题管理系统。技术能力包括使用代理服务器绕过远程图像的跨域安全限制,以及使用守护线程处理以在长时间运行的任务期间保持界面响应。
Aggregates multimedia and software resources from various web APIs and cloud drives into a unified interface.
Proxypool 是一个自动化的代理爬虫和聚合器,用于发现、验证和整理来自公共页面和订阅地址的代理服务器。它作为一个后台服务运行,收集跨多种协议的代理节点,并通过网络 API 提供经过验证的列表供外部使用。 该系统通过从多个来源聚合数据、去重条目并利用连接验证器来确保仅维护活跃且功能正常的节点,从而管理代理发现的全生命周期。爬取源通过配置文件进行管理,以定位特定的外部地址。 该项目通过计划的后台任务处理持续的代理列表管理,这些任务会自动刷新和更新可用节点。此过程包括自动连接测试和剔除不活跃服务器,以保持整理后的列表处于最新状态。
Collects and merges proxy nodes from multiple public pages and channels into a single curated list.
ShuiZe_0x727 是一个开源情报收集框架和攻击面管理工具。它作为一个资产发现引擎和网络情报聚合器,旨在识别面向互联网的资产、映射网络基础设施并可视化总网络暴露情况。 该项目集成了漏洞扫描和敏感数据泄露检测,以识别安全弱点和未经授权的访问点。它结合了网络空间 API 查询、证书日志分析和公共存储库扫描,以提取泄露的凭据、API 密钥和内部管理路径。 该框架提供了自动化信息收集和网络情报研究的功能,利用基于插件的扫描引擎来检测跨 Web 服务和开放端口的漏洞。收集到的资产数据和安全发现被导出为格式化的电子表格,以便进行离线分析和审计。
Merges technical data from certificate logs, DNS records, and crawlers into a single asset structure.
UserScripts is a collection of JavaScript browser userscripts designed to modify website behavior and add custom functionality to web browsers. It serves as a multi-purpose toolset for web page content automation, web interface enhancement, and specialized web scraping and downloading. The project distinguishes itself through a wide range of specialized utilities, including a browser-based text transformer for character encoding and terminology mapping, and tools for bypassing content censorship. It provides advanced web scraping capabilities such as deciphering obfuscated download links, agg
Aggregates multi-chapter text from web pages into a single file by detecting main content automatically.
Aidoku is a manga reader application and digital library manager. It serves as a modular content aggregator that allows users to discover, download, and read manga from various third-party sources and local files. The application utilizes a modular source plugin system to integrate external provider packages, enabling the ingestion of content from multiple third-party sources. It includes a sync engine that communicates with external tracking APIs to maintain consistent reading progress across different platforms. The system covers manga library management, including the ability to search fo
Merges manga content from disparate third-party sources into a unified internal structure for consistent rendering.