15 مستودعات
Frameworks designed for processing and transforming massive datasets across distributed computing environments.
Explore 15 awesome GitHub repositories matching data & databases · Distributed Computing Engines. Refine with filters or upvote what's useful.
fhevm is a full-stack blockchain framework designed to integrate Fully Homomorphic Encryption into smart contracts. It provides a platform for developing confidential smart contracts that can process encrypted data and execute private on-chain computations without decrypting the underlying information. The framework utilizes a coprocessor system to offload resource-intensive encrypted operations to an asynchronous service, improving blockchain performance and scalability. It incorporates a secure key management service based on multi-party computation and a zero-knowledge proof verifier to en
Provides an asynchronous computation service that offloads resource-intensive encrypted operations to maintain scalability.
Lila is an open-source chess server and multiplayer platform designed for playing, analyzing, and streaming games. It functions as a comprehensive environment for hosting competitive play and managing player profiles. The platform integrates a distributed chess engine interface to evaluate complex positions and a collaborative analysis board that allows multiple users to study and coordinate insights in real time. It also includes an online tournament platform for organizing competitive events, simultaneous exhibitions, and structured player leagues. The system maintains a searchable game da
Implements a distributed computing engine specialized for evaluating chess positions and calculating optimal moves in parallel.
Lila is a comprehensive, open-source chess gaming platform designed for real-time multiplayer interaction, competitive tournament management, and deep strategic analysis. It provides a global environment where users can engage in live matches, participate in structured competitions, and access extensive archives of historical game data for research and study. The platform distinguishes itself through a highly scalable architecture that utilizes actor-model concurrency and event-sourced game states to ensure precise match reconstruction and fault tolerance. It integrates distributed engine eva
Offloads computationally intensive move evaluations to a cluster of specialized servers for real-time tactical insights.
TiKV is a cloud-native distributed transactional key-value store and storage engine. It provides a distributed database designed for horizontal scalability and strong consistency across a cluster of physical nodes. The system uses a Raft-based consensus mechanism to maintain data availability and state synchronization. It ensures ACID compliance for distributed transactions through a two-phase commit workflow and manages data distribution via multi-Raft sharding. The engine handles massive datasets using automated range splitting and cluster load balancing to distribute data across different
Implements a coprocessor for executing filtering and aggregation logic directly on storage nodes to minimize network latency.
Gensim is an unsupervised natural language processing toolkit designed for topic modeling, word embedding training, and the processing of large-scale text corpora. It provides a framework for discovering latent themes and semantic structures in text without the need for labeled data. The toolkit is distinguished by its ability to handle datasets that exceed system memory through iterator-based data streaming from disk. It also supports distributed model training, allowing complex modeling tasks to be executed across computer clusters. The library covers a broad range of analysis capabilities
Functions as a distributed computing engine for processing and transforming massive text corpora.
Stockfish is a high-performance chess engine designed to evaluate board positions and calculate optimal moves. It functions as a command-line tool that utilizes neural network-based search algorithms to assess complex game states and determine strategic advantages. The engine is fully compliant with the Universal Chess Interface, allowing it to exchange commands and move data with external graphical user interfaces and professional analysis software. The engine distinguishes itself through advanced computational strategies that maximize hardware efficiency and search depth. It employs multi-t
Functions as a high-performance UCI-compliant engine that evaluates board positions and calculates optimal moves.
This project is a curated directory of software, frameworks, and educational resources designed for building, scaling, and maintaining distributed data processing and storage architectures. It serves as a comprehensive index for the distributed computing ecosystem, helping users identify the appropriate tools for managing large-scale information systems. The repository functions as a central hub for data engineering, offering categorized access to technologies that support batch and stream processing, machine learning, and interactive querying. By organizing these resources, it assists in the
Indexes a wide range of distributed computing engines and frameworks for batch, stream, and interactive data processing.
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Scales data profiling tasks across distributed enterprise environments to handle massive datasets efficiently.
Modin is a distributed dataframe library and parallel data processing engine designed to handle large datasets that exceed system memory. It functions as a distributed computing framework that parallelizes data manipulation tasks across multiple CPU cores or clusters to increase throughput and avoid memory errors. The project mirrors the Pandas API, allowing for the distribution of data workflows without changing core code logic. It utilizes a pluggable backend interface, which enables users to switch between different distributed execution engines to optimize performance based on available h
Provides a framework for processing and transforming massive datasets across distributed computing environments.
PySyft is a privacy-preserving machine learning framework and remote computation engine. It functions as a decentralized data analysis orchestrator that allows for the execution of data science workflows on remote servers without requiring the transfer of raw private data from the host device. The platform provides a secure collaboration environment where data owners manage permissions and authorize specific collaborators to run computations. It differentiates its workflow by utilizing mock data for local development and validation before submitting final analysis jobs to private remote serve
Provides a remote computation engine that allows analysis jobs to run on private data sources without raw information leaving the host.
SeaTunnel is a distributed data integration engine designed to synchronize structured and unstructured data across diverse sources and sinks. It functions as a multi-engine execution framework that can run data integration tasks across different distributed computing backends to optimize workload performance. The project is distinguished by a visual data pipeline designer for configuring workflows without manual code and a specialized change data capture tool for streaming incremental database updates. It also includes an enrichment pipeline that integrates large language models and embedding
Functions as a framework that can execute data integration tasks across various distributed computing backends.
Featuretools is a Python data science library and automated feature engineering framework designed to create predictive features from multiple related datasets. It automates the data preparation and transformation steps required for machine learning models through deep feature synthesis. The library enables the automatic generation of comprehensive feature tables by applying recursive transformations to relational data. It supports the transformation of unstructured text into structured numeric features and allows users to define custom primitives to extend the synthesis process with specific
Offloads heavy feature computation to multiple cores or clusters using distributed computing engines.
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Computes and writes the latest batch feature values into an online store for low-latency serving.
FATE is an open-source federated learning platform that enables multiple organizations to collaboratively train machine learning models without exposing raw data to any party. It provides a complete framework for private data collaboration, allowing participants to jointly compute on sensitive information while maintaining data privacy and security guarantees through secure multi-party computation protocols. The platform distinguishes itself through its comprehensive infrastructure management capabilities, supporting automated deployment of multi-party clusters using Ansible-driven provisioni
Runs distributed data processing and computation across parties using a custom cluster manager.
SEAL هي مكتبة تشفير متماثل (Homomorphic Encryption) وإطار عمل تشفير بـ C++ يتيح إجراء عمليات رياضية على البيانات المشفرة دون الحاجة إلى فك التشفير. توفر مجموعة أدوات لإجراء عمليات الجمع والضرب على الأعداد الصحيحة والأرقام المركبة المشفرة لدعم الحوسبة التي تحافظ على الخصوصية. ينفذ إطار العمل مخططات BFV و CKKS، مما يسمح بكل من الحساب النمطي على الأعداد الصحيحة المشفرة والحساب التقريبي على أرقام الفاصلة العائمة ذات الدقة الثابتة. يتضمن أغلفة متخصصة لدمج سير عمل التشفير هذا في بيئات .NET ويدعم النشر عبر الأنظمة الأساسية لـ Android و iOS و WebAssembly. تدير المكتبة دقة الحوسبة والأمان من خلال تبديل معامل التشفير (Ciphertext moduli switching)، والتحكم في الضوضاء، والتحقق من معايير التشفير مقابل معايير الأمان. كما توفر أدوات لضغط النص المشفر وإنشاء هياكل تخزين مشفرة حيث لا يمكن لمزود الخدمة الوصول إلى مفاتيح فك التشفير.
Provides a framework where data stays encrypted throughout its lifecycle and service providers never access keys.