1.2K 个仓库
Systems and workflows for ingesting, transforming, and orchestrating high-throughput data processing tasks.
Explore 1,176 awesome GitHub repositories matching data & databases · Data Processing Pipelines. Refine with filters or upvote what's useful.
Developer Roadmap 是一个社区驱动的平台,提供结构化的、基于图谱的软件工程学习路径。它作为一个综合知识仓库,将技术领域组织成可视化序列,以指导专业技能获取和职业成长。 该项目通过协作生态系统脱颖而出,使用户能够贡献路线图、策划行业最佳实践并维护个人职业档案。它集成了诊断评估框架来评估技术熟练度,帮助开发者识别知识缺口,并通过有针对性的学习序列为专业面试做准备。 除了核心映射能力外,该平台还提供实用的项目创意和交互式辅导,以巩固工程概念。它为社区提供了一个共享资源、跟踪技能进步和导航复杂技术领域的中心化空间。
Provides sequential access to elements within large data collections during processing.
这是一个全面的教育路线图,旨在指导软件工程师掌握计算机科学基础知识并准备技术面试。它提供了一条结构化的、具备依赖感知能力的学习路径,将复杂的计算概念组织成层级化课程,使用户能够通过迭代学习和实践实现,构建专业的工程基础。 该课程将理论知识与职业发展相结合,提供了一个包含书籍、学术论文和视频教程的交叉引用资源索引。它强调通过渐进复杂度分析实现算法效率的标准化,并提供细粒度的模块化主题分解,以促进跨广阔技术领域的专注、增量学习。 除了核心算法和数据结构外,该仓库还涵盖了广泛的能力领域,包括系统架构设计、分布式系统、计算机安全和高级数学建模。它还为整个招聘生命周期提供战略指导,从简历优化和行为面试准备到长期职业成长。 整个知识库作为版本控制的 Markdown 驱动仓库进行维护,允许以平台无关和协作的方式进行技术教育。
Reduces data footprint using encoding algorithms to enhance storage efficiency and transmission performance.
这是一个关于分布式系统架构和后端基础设施设计的综合教育资源和学习指南。它为掌握设计复杂软件系统所需的扩展性、可靠性和性能原则提供了结构化课程。 该仓库通过提供一种系统化的技术面试准备方法脱颖而出,结合了设计模式、架构权衡和间隔重复工具,帮助用户记忆复杂概念。它强调约束驱动的分析,教授用户在起草架构设计时如何评估延迟、一致性和可用性等相互竞争的需求。 内容涵盖了广泛的系统设计能力,包括数据库扩展、流量管理和基础设施优化策略。它详细介绍了水平扩展、多层缓存、异步通信和服务发现技术,同时还提供了用于执行资源估算和容量规划的框架。 文档以学习指南的形式组织,为后端工程和大规模系统设计的基础知识提供了系统化的路径。
Provides helper libraries and scripts that assist in the scheduling, monitoring, and management of batch processing jobs.
这是一个全面的、由社区策划的目录,组织了庞大的 Python 软件库、框架和工具生态。它作为一个中心化知识库,旨在促进生态导航并加速开发者在整个软件开发生命周期中的发现过程。 该目录通过提供按技术领域分类的结构化资源索引脱颖而出,范围从基础开发工具到专业工程领域。它涵盖了人工智能、数据科学、Web 开发和基础设施管理等高级能力,使开发者能够为特定的技术挑战识别经过验证的解决方案。 该项目涵盖了广泛的能力领域,包括依赖管理、静态代码分析和自动化测试工具。它还编目了用于持久数据存储、云基础设施编排和接口开发的资源,为构建和维护复杂软件系统提供了统一的参考。
Enable fast, relevant query results across datasets through high-performance indexing and full-text search capabilities.
该项目是一个经过验证的计算实现综合仓库,旨在作为计算机科学和算法问题解决的教育资源。它提供了一个结构化的代码示例集合,涵盖了基本数据结构、数学运算和核心编程概念,允许用户研究各种计算方法背后的逻辑和复杂度。 该仓库通过模块化的、基于参考的实现模式脱颖而出,将代码组织成逻辑命名空间。这种方法促进了独立执行和教育清晰度,使用户能够探索计算策略从朴素的暴力破解方法到优化的、高性能解决方案的演变。通过将数据结构抽象与算法操作解耦,该项目确保了实现保持可互换且易于分析。 能力领域涵盖了广泛的技术领域,包括机器学习、密码学、科学计算和计算机视觉。它包括用于预测建模、神经网络和统计分析的实现,以及用于数字信号处理、网络流管理和金融建模的工具。该集合还解决了专门的数学需求,如线性代数、几何计算和位操作,为研究和工程应用提供了广泛的基础。
Shrink digital information streams through encoding techniques to improve storage density and transmission speeds.
Vue 是一个渐进式的、基于组件的 JavaScript 框架,旨在构建响应式用户界面和单页应用程序。它以声明式模板系统为中心,将 HTML 转换为高效的渲染函数,允许开发者将复杂的界面组织成自动与应用程序状态同步的隔离、可复用单元。 该框架通过依赖跟踪响应式系统脱颖而出,该系统在渲染期间监控数据访问以触发精确更新。它提供了一个灵活的架构,支持作为轻量级库的增量采用和全规模应用程序开发。开发者可以利用强大的基于插件的扩展模型来注入全局逻辑,同时框架的虚拟 DOM 对账确保通过计算最小突变来实现高效的界面更新。 除了核心渲染能力外,该项目还包括一套全面的工具,用于管理应用程序状态、基于 URL 的路由和服务器端渲染。它为组件组合、内容分发和动画管理提供了广泛支持,并内置了自动内容转义等安全措施以防止常见漏洞。 该框架随附官方类型声明以支持静态分析,并可通过标准包管理器安装,或通过脚本标签直接集成到浏览器环境中。
Renders filtered or sorted data sets using computed properties without modifying the original source.
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The system provides high-level interfaces for defining neural network architectures, alongside a robust engine for managing multidimensional array structures and tensor mathematics. The framework distinguishes itself through a scalable distributed runtime that orchestrates workloads acr
Applies optimized routines to perform element-wise operations and shape manipulations on multi-dimensional data structures.
n8n is a workflow automation platform that combines a visual interface with code-based extensibility to design, orchestrate, and manage automated processes. It provides a comprehensive suite of tools for data transformation, filtering, and storage, allowing users to build complex logic through conditional branching, looping, and sub-workflow execution. The platform supports both pre-built integration nodes and custom code execution in JavaScript or Python, enabling connectivity with a wide range of external services and APIs. The platform includes a suite of generative AI capabilities, such a
Eliminates redundant entries within data streams to maintain unique event records throughout automated sequences.
AutoGPT is an orchestration platform designed for building, managing, and deploying autonomous agents. It provides a visual canvas-based environment where users can assemble agents by connecting modular blocks that represent actions, data flows, and conditional logic. The platform supports the entire agent lifecycle, including task scheduling, execution monitoring, and configuration management, while offering a marketplace for discovering and sharing community-built workflows. The project includes a legacy framework for command-line agent execution and an extensible component system for devel
Transforms unstructured keyword objects into structured, typed fields for metric analysis.
This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently discover resources tailored to specific functional domains. The repository distinguishes itself through a decentralized contribution model, where community-driven updates ensure the index remains current with the rapidly evolving software landscape. Beyond simple resource listing,
Streamlines reactive programming and data stream transformations using specialized toolkits.
This project is a command-line media downloader designed for the systematic retrieval and organization of digital content from diverse online platforms. It functions as an extensible extraction engine that utilizes a declarative format-selection pipeline to automate the identification, merging, and downloading of specific audio and video streams based on user-defined criteria. The system distinguishes itself through a modular architecture that supports custom plugins and site-specific scripts, allowing for the bypass of platform restrictions and the handling of complex authentication challeng
Evaluates stream metadata against defined criteria to transform and restructure raw media into desired file formats.
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
Structures keyword arguments by modality to ensure type-safe configuration and model-specific overrides during document processing.
This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine-readable content. The toolkit distinguishes itself through a modular, plugin-based architecture that orchestrates multi-stage extraction pipelines. Users can steer the parsing behavior by injecting custom instructions, enabling the system to adapt to domain-specific document st
Converts diverse document formats into structured text output by executing programmatic parsing logic to automate complex data extraction workflows.
LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows that manage state, memory, and tool execution. The project distinguishes itself through a durable execution runtime that maintains persistent state across long-running processes by checkpointing progress to external storage. It models agent workflows as directed graphs, allowing
Process diverse binary and multimodal data types through unified interfaces designed for complex AI pipelines.
Firecrawl is a headless browser automation tool and web crawling engine designed to extract structured data from the web. It functions as an API that transforms raw website content and documents into clean markdown and JSON formats to serve as context for large language models. The project distinguishes itself by using natural language prompts to translate human instructions into targeted data extraction tasks and browser actions. It can execute interactive page navigation, such as clicking and scrolling, and perform automated web research to retrieve structured data without manual interventi
Transforms unstructured web pages and documents into standardized, machine-readable formats using natural language prompts.
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture. The platform distinguishes itself through its focus on agentic workflows, providing a programmatic interface that allows autonomous agents to perform live
Prepares raw web content for AI by converting it into clean, structured formats like markdown or JSON.
This project is a community-maintained, open-source repository that functions as a centralized directory for streaming metadata. It aggregates publicly available network stream links and organizes them into standardized, machine-readable playlist formats. By acting strictly as a metadata-only index, the platform enables users to access and organize live broadcast content across various third-party media playback applications without hosting or distributing any actual video files. The repository distinguishes itself through a collaborative, crowdsourced workflow where contributors actively mai
Merges distributed community updates into a unified, structured dataset of verified streaming links.
D3 is a modular library providing low-level primitives for creating data-driven visualizations. It functions as a flexible framework that allows for direct control over visual presentation by mapping abstract data dimensions to graphical properties, such as position, color, and size, without imposing predefined chart abstractions. The library distinguishes itself by offering specialized tools for complex data representation, including algorithmic layouts for hierarchical structures and geographic projection utilities for mapping spherical coordinates. It also includes a comprehensive suite fo
Comprehensive utilities handle the ordering, searching, summarizing, binning, and grouping of complex data sets.
Godot is a comprehensive, node-based game engine designed for building interactive 2D and 3D applications. It provides an integrated development environment that utilizes a hierarchical scene system to organize objects, propagate spatial transformations, and manage lifecycle events. The engine functions as a cross-platform development suite, allowing developers to author, test, and export software to desktop, mobile, and web environments from a single, unified codebase. The engine distinguishes itself through a modular, component-based architecture that relies on signals-based decoupling for
Implements native data types for vectors, transforms, and arrays to enable high-performance mathematical operations.
Axios is a promise-based HTTP client used to make asynchronous network requests in both browser and Node.js environments. It functions as a multi-environment network adapter that abstracts the transport layer to ensure consistent behavior across different runtimes. The project distinguishes itself through a request lifecycle management system that allows for the cancellation of active requests, the setting of timeouts, and the monitoring of upload and download transfer progress. It includes a mechanism for intercepting network traffic, enabling the transformation of outgoing requests and inco
Implements automatic serialization of JavaScript objects into JSON, multipart form data, or URL-encoded formats for transmission.