36 مستودعات
High-performance engines designed for parallelized data processing across clusters, distinct from high-level workflow orchestration tools.
Explore 36 awesome GitHub repositories matching data & databases · Distributed Compute Frameworks. Refine with filters or upvote what's useful.
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Analyzes and transforms continuous real-time data streams for immediate insight and analytics.
fhevm is a full-stack blockchain framework designed to integrate Fully Homomorphic Encryption into smart contracts. It provides a platform for developing confidential smart contracts that can process encrypted data and execute private on-chain computations without decrypting the underlying information. The framework utilizes a coprocessor system to offload resource-intensive encrypted operations to an asynchronous service, improving blockchain performance and scalability. It incorporates a secure key management service based on multi-party computation and a zero-knowledge proof verifier to en
Provides an asynchronous computation service that offloads resource-intensive encrypted operations to maintain scalability.
Deepface is a comprehensive deep learning library for facial recognition and demographic analysis. It provides a modular pipeline that handles the entire lifecycle of facial processing, including detection, geometric alignment, and the transformation of facial images into high-dimensional numerical vector embeddings for identity verification and similarity comparison. The library distinguishes itself through a model ensemble approach, which combines predictions from multiple pre-trained neural networks to improve classification accuracy and reduce bias. It also integrates advanced security fe
Handles asynchronous processing of data streams to support real-time facial analysis tasks.
Backtrader is a Python framework designed for the development, backtesting, and live execution of algorithmic trading strategies. It provides a comprehensive environment for quantitative finance, allowing users to simulate trading logic against historical market data or connect directly to brokerage platforms for automated real-time trading. The project distinguishes itself through a unified event-driven architecture that treats backtesting and live trading with the same API. This consistency is supported by a flexible data-feed abstraction layer that normalizes diverse financial sources, ena
Synchronizes data streams of varying granularities to evaluate long-term trends alongside short-term price movements.
NATS Server is a high-performance, lightweight messaging system designed for cloud-native applications, edge computing, and distributed microservices. It functions as a distributed publish-subscribe broker that routes messages using hierarchical, dot-separated subject strings, enabling decoupled communication between services without requiring centralized broker lookups. The system supports core messaging patterns including asynchronous publish-subscribe, request-reply, and load-balanced queue processing. The platform distinguishes itself through a decentralized architecture that eliminates t
Supports push and pull patterns for consuming persistent log data at scale.
Lila is an open-source chess server and multiplayer platform designed for playing, analyzing, and streaming games. It functions as a comprehensive environment for hosting competitive play and managing player profiles. The platform integrates a distributed chess engine interface to evaluate complex positions and a collaborative analysis board that allows multiple users to study and coordinate insights in real time. It also includes an online tournament platform for organizing competitive events, simultaneous exhibitions, and structured player leagues. The system maintains a searchable game da
Implements a distributed computing engine specialized for evaluating chess positions and calculating optimal moves in parallel.
Lila is a comprehensive, open-source chess gaming platform designed for real-time multiplayer interaction, competitive tournament management, and deep strategic analysis. It provides a global environment where users can engage in live matches, participate in structured competitions, and access extensive archives of historical game data for research and study. The platform distinguishes itself through a highly scalable architecture that utilizes actor-model concurrency and event-sourced game states to ensure precise match reconstruction and fault tolerance. It integrates distributed engine eva
Offloads computationally intensive move evaluations to a cluster of specialized servers for real-time tactical insights.
TiKV is a cloud-native distributed transactional key-value store and storage engine. It provides a distributed database designed for horizontal scalability and strong consistency across a cluster of physical nodes. The system uses a Raft-based consensus mechanism to maintain data availability and state synchronization. It ensures ACID compliance for distributed transactions through a two-phase commit workflow and manages data distribution via multi-Raft sharding. The engine handles massive datasets using automated range splitting and cluster load balancing to distribute data across different
Implements a coprocessor for executing filtering and aggregation logic directly on storage nodes to minimize network latency.
Gensim is an unsupervised natural language processing toolkit designed for topic modeling, word embedding training, and the processing of large-scale text corpora. It provides a framework for discovering latent themes and semantic structures in text without the need for labeled data. The toolkit is distinguished by its ability to handle datasets that exceed system memory through iterator-based data streaming from disk. It also supports distributed model training, allowing complex modeling tasks to be executed across computer clusters. The library covers a broad range of analysis capabilities
Functions as a distributed computing engine for processing and transforming massive text corpora.
This project is an asynchronous network framework for Python that provides both a client and a server for HTTP communication. It is designed to handle high-concurrency network operations by leveraging cooperative multitasking, allowing for the management of thousands of simultaneous connections without the overhead of traditional thread-per-request models. The framework distinguishes itself through its focus on efficient resource management and persistent communication. It utilizes connection pooling to reuse network sockets, which reduces latency during sequential requests, and supports full
Handles high-volume network traffic through memory-efficient chunked data processing.
Stockfish is a high-performance chess engine designed to evaluate board positions and calculate optimal moves. It functions as a command-line tool that utilizes neural network-based search algorithms to assess complex game states and determine strategic advantages. The engine is fully compliant with the Universal Chess Interface, allowing it to exchange commands and move data with external graphical user interfaces and professional analysis software. The engine distinguishes itself through advanced computational strategies that maximize hardware efficiency and search depth. It employs multi-t
Functions as a high-performance UCI-compliant engine that evaluates board positions and calculates optimal moves.
Hammerspoon is a programmable automation engine for macOS that enables deep system-level control through a Lua scripting environment. By bridging high-level scripts with native Objective-C APIs, it allows users to interact with the operating system's accessibility tree, intercept hardware input streams, and manage the lifecycle of running applications. The project distinguishes itself through an event-driven architecture that registers asynchronous hooks for system notifications and hardware events. This allows for real-time automation, such as remapping keyboard and mouse inputs, managing wi
Provides callback-based processing for incoming network data streams.
This project is a curated directory of software, frameworks, and educational resources designed for building, scaling, and maintaining distributed data processing and storage architectures. It serves as a comprehensive index for the distributed computing ecosystem, helping users identify the appropriate tools for managing large-scale information systems. The repository functions as a central hub for data engineering, offering categorized access to technologies that support batch and stream processing, machine learning, and interactive querying. By organizing these resources, it assists in the
Indexes a wide range of distributed computing engines and frameworks for batch, stream, and interactive data processing.
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Scales data profiling tasks across distributed enterprise environments to handle massive datasets efficiently.
The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane. The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
Consumes sequential data modification logs to trigger downstream workflows and maintain system state.
This project is a Python wrapper for the TA-Lib library, providing a technical analysis library for computing moving averages, momentum, and volatility metrics for financial time series analysis. It serves as a financial indicator calculator that processes price and volume arrays to generate technical signals and pattern recognition. The library includes an incremental data processor capable of computing the most recent technical indicator values as new streaming market data arrives. This allows for real-time price monitoring and the processing of streaming data without recalculating entire d
Processes continuous streams of market data in real time to update indicator values incrementally.
Orleans is a .NET distributed actor framework designed for building scalable, cloud-native applications. It implements a virtual actor model where entities with stable identities manage their own state and lifecycle across a cluster of servers. The framework provides a distributed state management system with ACID transaction support and a distributed pub/sub streaming engine for real-time data processing. It distinguishes itself through location-transparent routing, automatic actor activation and deactivation, and elastic cluster scaling that redistributes workloads during node failures. Th
Provides a managed system for processing continuous data streams in near-real time with checkpoints and batch delivery.
nom is a parser combinator framework for Rust used to build complex parsers by combining small, reusable parsing functions. It functions as a zero-copy parsing tool that minimizes memory overhead by returning slices of the original input instead of allocating new memory. The framework is designed for diverse data formats, serving as a binary data parser with configurable endianness and a bitstream processing library capable of extracting values of arbitrary bit length. It also functions as a streaming data parser that can process data arriving in chunks and signal when additional input is req
Processes data arriving in chunks and signals when more input is required for a result.
nom is a Rust parser combinator framework used to build complex parsers for binary and text data. It functions as an abstract syntax tree generator and a bit-level binary parser, allowing users to construct structured data by combining small, reusable parsing functions. The framework provides specialized support for zero-copy binary parsing, extracting data as slices from raw byte arrays to avoid memory allocations. It also includes a streaming data parser capable of processing partial input chunks from networks or files and signaling when additional input is required. The project covers a b
Handles partial data chunks and requests more input to complete parsing operations.
Modin is a distributed dataframe library and parallel data processing engine designed to handle large datasets that exceed system memory. It functions as a distributed computing framework that parallelizes data manipulation tasks across multiple CPU cores or clusters to increase throughput and avoid memory errors. The project mirrors the Pandas API, allowing for the distribution of data workflows without changing core code logic. It utilizes a pluggable backend interface, which enables users to switch between different distributed execution engines to optimize performance based on available h
Provides a high-performance engine that parallelizes Pandas dataframe operations across multiple CPU cores or clusters.