# Federated Learning Frameworks

> Search results for `federated learning to train models without sharing data` on awesome-repositories.com. 117 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/federated-learning-to-train-models-without-sharing-data

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/federated-learning-to-train-models-without-sharing-data).**

## Results

- [fchollet/deep-learning-models](https://awesome-repositories.com/repository/fchollet-deep-learning-models.md) (7,349 ⭐) — This project is a collection of deep learning tools for image classification and audio tagging, providing a repository of pre-trained model weights and architectures. It serves as a Keras model zoo that enables the immediate use of established neural networks for inference and transfer learning.

The library includes a music tagging framework that classifies audio recordings using convolutional recurrent neural networks and mel-spectrograms. For visual data, it provides implementations of architectures such as ResNet, VGG, and Xception, alongside a repository of weights trained on large datasets like ImageNet.

The project covers a broad range of capabilities including computer vision and audio analysis. It supports the generation of visual feature maps through layer-based feature extraction and provides workflows for adapting pre-existing networks to new datasets.
- [deepspeedai/deepspeed](https://awesome-repositories.com/repository/deepspeedai-deepspeed.md) (42,528 ⭐) — DeepSpeed is a high-performance library designed to scale deep learning model training and inference across massive clusters of GPUs and compute nodes. It provides a comprehensive suite of tools for distributed training, enabling the execution of models that exceed the memory capacity of single devices through advanced parameter partitioning, pipeline-based model parallelism, and memory-efficient state offloading.

The framework distinguishes itself through specialized communication-efficient optimizers and hardware-aware acceleration techniques. By utilizing gradient compression, quantization, and custom-compiled kernels, it minimizes network bandwidth bottlenecks and maximizes computational throughput. It further supports complex architectures like mixture-of-experts and long-context models by integrating sequence parallelism and sparse attention mechanisms, ensuring efficient resource utilization across heterogeneous hardware topologies.

Beyond its core training capabilities, the project includes a robust set of utilities for automated performance tuning, model profiling, and universal checkpointing. It provides infrastructure support for diverse processor architectures and cloud-based cluster deployment, allowing users to optimize execution environments through targeted kernel compilation and diagnostic monitoring.
- [axolotl-ai-cloud/axolotl](https://awesome-repositories.com/repository/axolotl-ai-cloud-axolotl.md) (12,059 ⭐) — Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizing structured configuration files, the platform streamlines the setup of training parameters, dataset paths, and hardware distribution strategies.

The project distinguishes itself through its support for diverse training methodologies, including full-parameter tuning, parameter-efficient adaptation, and reinforcement learning alignment. It provides specialized capabilities for multimodal model training, allowing for the integration of text, image, and media inputs. Furthermore, the framework includes advanced optimization tools such as quantization-aware training, which simulates precision loss to maintain model accuracy, and dynamic reward signal integration for aligning model behavior with human preferences.

The framework covers a broad capability surface, including data management, performance optimization, and model lifecycle management. It handles data ingestion, preprocessing, and streaming, while offering advanced techniques like sequence packing and replay buffers to improve training efficiency. Performance is managed through distributed parallelism strategies, memory-efficient training pipelines, and custom kernel implementations.

The project provides pre-configured container images to ensure consistent deployment across local and cloud-based compute environments. Users can manage the entire model lifecycle, from initial configuration and training to adapter merging and final inference execution.
- [etcd-io/etcd](https://awesome-repositories.com/repository/etcd-io-etcd.md) (51,838 ⭐) — etcd is a distributed, strongly consistent key-value store designed to provide reliable storage for critical system metadata and coordination primitives. It functions as a distributed consensus engine, utilizing a replicated log and leader-based state machine to ensure that all nodes in a cluster maintain a synchronized view of data. By providing atomic operations and linearizable reads and writes, it serves as a foundational component for distributed systems requiring high availability and fault tolerance.

The system distinguishes itself through its multi-version concurrency control, which enables non-blocking read operations while maintaining strict consistency for concurrent writes. It supports complex distributed coordination through features like lease-based expiration, which allows for the automatic removal of data based on client activity, and asynchronous key change monitoring, which provides real-time event notifications for data modifications. These capabilities are supported by a persistent B-tree-based storage engine and write-ahead logging to ensure durability across system crashes.

Beyond its core storage functions, the project provides a comprehensive suite of tools for cluster management, including automated peer discovery via DNS or service registries and robust security enforcement. It includes built-in mechanisms for transport layer security, role-based access control, and certificate management to protect data in transit and at rest. Operational reliability is further maintained through snapshot-based disaster recovery, cluster health monitoring, and granular performance tuning for disk and network resources.

The system is configured through structured files or command-line flags, allowing for flexible deployment across diverse infrastructure environments.
- [googlecloudplatform/training-data-analyst](https://awesome-repositories.com/repository/googlecloudplatform-training-data-analyst.md) (8,566 ⭐) — This project is a cloud data analysis sandbox and a collection of courseware designed for learning data analysis techniques on Google Cloud Platform. It serves as a training lab containing technical demonstrations and practical exercises for skill development and cloud certification.

The repository provides guided labs and demonstrations focused on Google Cloud data analysis, encompassing technical training for the platform's specific data services. It enables the practice of cloud data engineering and the use of big data tooling to perform queries and data transformations.

The environment supports hands-on exercises through a cloud-based lab setup with virtual machine orchestration and scripted environment configuration. These workflows include scenario-based dataset provisioning and the integration of native cloud console interfaces and command line tools.
- [tensorflow/tensorflow](https://awesome-repositories.com/repository/tensorflow-tensorflow.md) (195,697 ⭐) — TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The system provides high-level interfaces for defining neural network architectures, alongside a robust engine for managing multidimensional array structures and tensor mathematics.

The framework distinguishes itself through a scalable distributed runtime that orchestrates workloads across heterogeneous hardware accelerators and decentralized network nodes. It employs deferred-execution symbolic graphs to perform graph-level optimizations, fusion, and ahead-of-time kernel compilation for specific hardware architectures. To ensure consistent performance across production environments, it features a standardized serialization format for model graphs and specialized tools for model serving, quantization, and compression.

Beyond core training capabilities, the platform includes a high-throughput data ingestion engine that supports asynchronous, multi-threaded pipelines to prevent bottlenecks. It also offers extensive support for hardware abstraction, allowing for pluggable device integration and containerized acceleration. The ecosystem is rounded out by utilities for data validation, federated learning, and specialized modeling tasks, providing a complete toolchain for moving models from research into high-availability production environments.
- [module-federation/core](https://awesome-repositories.com/repository/module-federation-core.md) (2,551 ⭐) — Module Federation is a concept that allows developers to share code and resources across multiple JavaScript applications
- [cmusatyalab/openface](https://awesome-repositories.com/repository/cmusatyalab-openface.md) (15,398 ⭐) — Openface is a deep learning toolkit designed for facial recognition and identity verification. It provides a comprehensive pipeline for detecting faces, aligning landmarks, and transforming facial images into compact numerical vectors. By utilizing these embeddings, the system enables identity classification and similarity comparison through geometric distance calculations.

The project distinguishes itself by integrating research-oriented diagnostic tools alongside its core recognition capabilities. It includes utilities for visualizing high-dimensional feature clusters, inspecting internal convolutional network activations, and evaluating model performance through standard accuracy metrics. These features allow for the analysis of how specific facial regions contribute to recognition decisions and how models converge during training.

The framework supports end-to-end workflows, ranging from training support vector machines for classification to executing real-time identification across video streams. It includes utilities for tracking faces across frames to maintain consistency and provides a containerized environment to manage the complex dependencies required for deep learning tasks.
- [dmlc/xgboost](https://awesome-repositories.com/repository/dmlc-xgboost.md) (28,471 ⭐) — XGBoost is a distributed machine learning library for implementing scalable gradient boosting decision trees used for regression, classification, and ranking. It functions as a predictive model framework and a cross-language toolkit, providing a core implementation with native bindings for Python, R, Java, Scala, and C++.

The system is designed as a GPU-accelerated library that utilizes CUDA and NCCL to speed up the training of decision tree ensembles. It operates as a distributed framework capable of scaling training and prediction across multi-node clusters and GPU environments to process massive datasets.

The library covers a wide range of modeling tasks, including random forests, learning to rank, and survival or quantile regression. It includes capabilities for model optimization through custom loss functions, model interpretability via SHAP value computation and feature importance analysis, and data management techniques like external memory handling for datasets that exceed available system memory.
- [shizhediao/post-training-data-flywheel](https://awesome-repositories.com/repository/shizhediao-post-training-data-flywheel.md) (65 ⭐) — We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.
- [dusty-nv/jetson-inference](https://awesome-repositories.com/repository/dusty-nv-jetson-inference.md) (8,734 ⭐) — jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput.

The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory.

The codebase covers a broad surface of capabilities, including real-time video analytics, object detection and tracking, and image segmentation. It also integrates hardware-accelerated decoding and TensorRT-based inference to optimize model execution on embedded platforms.

The project provides a TensorRT inference wrapper and an embedded vision SDK to facilitate the deployment of neural network primitives.
- [cube-js/cube](https://awesome-repositories.com/repository/cube-js-cube.md) (20,251 ⭐) — Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools.

The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orchestrates these interactions by mapping questions to the underlying semantic model, ensuring that AI-generated insights remain accurate and context-aware. Furthermore, Cube is designed for multi-tenant environments, offering robust infrastructure isolation, row-level security, and dynamic context injection to ensure that data access is strictly governed and personalized for every user or tenant.

Beyond its core modeling and AI features, the platform includes a comprehensive suite of tools for performance optimization, including automated pre-aggregation caching and asynchronous query queuing. It supports a wide range of data sources and deployment models, from self-hosted containers to managed cloud environments. The system also provides extensive programmatic control over report management, dashboard publishing, and user identity synchronization, making it suitable for embedding interactive analytics directly into custom software applications.
- [parvardegr/sharing](https://awesome-repositories.com/repository/parvardegr-sharing.md) (1,834 ⭐) — Sharing is a command-line tool to share directories and files from the CLI to iOS and Android devices without the need of an extra client app
- [federatedai/fate](https://awesome-repositories.com/repository/federatedai-fate.md) (6,048 ⭐) — FATE is an open-source federated learning platform that enables multiple organizations to collaboratively train machine learning models without exposing raw data to any party. It provides a complete framework for private data collaboration, allowing participants to jointly compute on sensitive information while maintaining data privacy and security guarantees through secure multi-party computation protocols.

The platform distinguishes itself through its comprehensive infrastructure management capabilities, supporting automated deployment of multi-party clusters using Ansible-driven provisioning and cloud-native technologies like containers and Kubernetes. FATE includes a DAG-based pipeline scheduler for orchestrating federated tasks, an Eggroll distributed compute engine for distributed data processing, and a federated model serving proxy for routing inference requests with privacy-preserving transformations. The system implements intersection-safe aggregation protocols and a party-role-based topology that assigns participants specific roles (guest, host, arbiter) to define data access and computation permissions.

Beyond core training and serving, FATE offers capabilities for deploying standalone instances for local development, running on ARM architecture, and managing federated infrastructure through release artifacts and Docker containers. The platform also provides visualization tools for exploring model behavior and performance.
- [datahub-project/datahub](https://awesome-repositories.com/repository/datahub-project-datahub.md) (12,141 ⭐) — DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations.

The platform distinguishes itself through its focus on grounding artificial intelligence and autonomous agents in verified enterprise context. It provides specialized capabilities to inject provenance-aware lineage, business definitions, and quality signals into AI prompts, ensuring that generated insights are accurate and trustworthy. Through a policy-as-code governance engine, it enforces access controls and compliance rules directly within the metadata graph, allowing for programmatic oversight of data assets across hybrid environments.

Beyond its core identity, the project offers a comprehensive suite of tools for data discovery, observability, and lifecycle management. It includes features for automated lineage extraction, impact analysis, and semantic search, enabling users to navigate data dependencies and resolve quality issues efficiently. The platform also supports collaborative workflows, allowing teams to manage business glossaries, certify data assets, and automate access requests through integrated communication channels.

DataHub is built to scale, utilizing a distributed architecture that allows storage, search, and graph processing layers to operate independently. It provides standardized interfaces and a bridge-based connector framework to facilitate integration with heterogeneous data sources and external AI agent frameworks.
- [geldata/gel](https://awesome-repositories.com/repository/geldata-gel.md) (14,065 ⭐) — Gel is an object-relational database system that models data as a graph of interconnected objects. By utilizing a strongly typed schema, it enables complex relational queries and polymorphic data structures without the need for traditional join tables. The system integrates native vector storage and similarity search operators, allowing it to function as both a relational and a vector database for semantic data retrieval.

The platform distinguishes itself through a comprehensive suite of developer-centric automation tools. It features a declarative migration system that tracks and versions schema changes, supporting advanced workflows like schema branching and merging. To ensure application-level reliability, the database introspects its own schema to generate type-safe client libraries and query builders, providing consistent data structures across application code.

Beyond core storage, the system provides extensive capabilities for data modeling, including computed properties, custom scalar types, and complex constraints. It supports versatile query execution, ranging from hierarchical nested data retrieval and atomic transactions to integrated retrieval-augmented generation workflows that connect directly to external language models.

The project is managed through a command-line interface that handles the full lifecycle of database instances, including provisioning, monitoring, and automated backup restoration. It offers flexible connectivity options, supporting both native language-specific drivers and a standardized HTTP-based query protocol.
- [project-monai/monai](https://awesome-repositories.com/repository/project-monai-monai.md) (7,869 ⭐) — MONAI is a PyTorch-based deep learning framework and library specifically designed for healthcare imaging. It provides a suite of domain-specific neural network architectures, specialized loss functions, and preprocessing pipelines tailored for analyzing multi-dimensional medical data.

The project distinguishes itself through a decentralized federated learning system that allows models to learn from datasets across multiple institutions without exchanging raw patient images. It also features AI-assisted medical image annotation tools and a standardized model bundling system to ensure consistent inference and reproducibility across clinical workstations and cloud environments.

The framework covers the full medical AI lifecycle, including data engineering via spatial resampling and normalization, distributed training across multi-GPU nodes, and model evaluation using specialized imaging metrics and result visualization.

The library is implemented in Python.
- [josephmisiti/awesome-machine-learning](https://awesome-repositories.com/repository/josephmisiti-awesome-machine-learning.md) (72,867 ⭐) — This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify discovery across the artificial intelligence ecosystem.

The collection distinguishes itself by providing a cross-language development index that spans diverse programming environments, including C, C++, Rust, Clojure, and Python. It covers a wide range of specialized capabilities, from neural network implementation and deep learning frameworks to computer vision, natural language processing, and reinforcement learning. The repository also highlights hardware-accelerated compute kernels and neurosymbolic architectures, offering a broad view of both established and emerging machine learning technologies.

Beyond software libraries, the directory includes a curated roadmap of foundational learning materials, such as textbooks and documentation on linear algebra, probability, statistics, and distributed machine learning patterns. This structured approach provides a technical reference for those seeking to understand both the theoretical underpinnings and the practical implementation of modern computational intelligence.
- [99designs/gqlgen](https://awesome-repositories.com/repository/99designs-gqlgen.md) (10,729 ⭐) — gqlgen is a schema-first Go library designed to build type-safe GraphQL servers. It functions as a code generation engine that transforms declarative GraphQL schema definitions into strongly-typed Go source code, ensuring strict alignment between the API contract and the underlying implementation.

The framework distinguishes itself through its deep integration with the Go type system and its highly extensible build pipeline. By using schema-first development, it automates the creation of server boilerplate and resolver stubs, allowing developers to map schema fields directly to Go structs and methods. It supports advanced architectural patterns such as distributed federation, custom middleware for cross-cutting concerns, and directive-based metadata injection to influence generated code and runtime behavior.

Beyond core generation, the toolkit provides a comprehensive suite of features for managing complex API lifecycles. This includes performance-oriented capabilities like database request batching, deferred field resolution, and query complexity analysis to protect server resources. It also handles real-time data streaming via subscriptions, multipart file uploads, and robust error propagation, all while maintaining observability through integrated tracing and logging hooks.

The project is distributed as a Go module, with documentation and installation instructions available in the primary repository.
- [tensorflow/federated](https://awesome-repositories.com/repository/tensorflow-federated.md) (2,441 ⭐) — An open-source framework for machine learning and other computations on decentralized data.
- [data-creative/next-train-api](https://awesome-repositories.com/repository/data-creative-next-train-api.md) (0 ⭐) — The Next Train API provides a JSON web service for any GTFS feed. Deploy this source code to your own Heroku server to set up an API for your own agency's feed. Let me know how it goes. I'm happy to support you!
- [adap/flower](https://awesome-repositories.com/repository/adap-flower.md) (6,971 ⭐) — Flower is a federated learning framework and distributed machine learning orchestrator designed to train models across decentralized devices. It functions as a privacy-preserving toolkit that enables model training and data analysis on local hardware, ensuring raw data remains on the device while contributing to a synchronized global model.

The system employs an agnostic wrapper and integrator to connect diverse machine learning libraries, allowing different frameworks to operate within the same training loop. It uses a remote procedure call orchestrator to manage the exchange of model weights and metadata between a central server and remote workers.

The framework covers model aggregation management through interchangeable strategies and supports a custom message bus for transmitting non-standard data packets. It also provides capabilities for performing federated analytics across separate datasets without centralizing the raw information.
- [clearml/clearml](https://awesome-repositories.com/repository/clearml-clearml.md) (6,740 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the end-to-end machine learning lifecycle, from initial experimentation to production deployment. It provides a suite of integrated tools including a pipeline orchestrator for automating workflows, an experiment tracking tool for logging hyperparameters and metrics, and a metadata-driven data versioning system for managing large-scale datasets and model artifacts.

The platform is distinguished by its advanced compute management and serving capabilities. It features a GPU compute manager that supports fractional resource slicing and priority scheduling across hybrid cloud environments. Additionally, it includes a dedicated serving framework for hosting large language models and agentic workflows through secure APIs with integrated autoscaling.

The system covers a broad range of operational capabilities, including real-time infrastructure cost tracking, multi-tenant resource isolation, and automated execution environment reproduction. It also provides observability tools for monitoring inference endpoints, auditing AI workflows, and analyzing system-level hardware utilization.

The orchestration engine can be deployed via containerized or cloud-image based installations to host the platform's lifecycle infrastructure.
- [karpathy/nanochat](https://awesome-repositories.com/repository/karpathy-nanochat.md) (55,103 ⭐) — Nanochat is a lightweight execution environment designed for training and running language models on standard consumer hardware. It functions as both a neural network training framework and an inference engine, enabling users to perform backpropagation-based training and model execution directly on general-purpose processors without the need for dedicated graphics hardware.

The project distinguishes itself through a suite of optimization tools that prioritize efficiency on local machines. By utilizing memory-mapped weight loading and CPU-optimized vector math, it maximizes throughput for interactive sessions. Furthermore, the framework includes a quantization toolkit that allows users to adjust the numerical precision of weights and activations, effectively balancing memory consumption against computational speed.

The platform supports a range of capabilities for transformer architecture experimentation, including the configuration of training parameters and the management of local data pipelines. It employs a stateless generation loop to process tokens through self-contained execution cycles, facilitating the development and fine-tuning of custom models in a private, local environment.
- [allegroai/clearml](https://awesome-repositories.com/repository/allegroai-clearml.md) (6,733 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the entire machine learning lifecycle. It functions as an experiment tracking tool, a data versioning system, and a pipeline orchestrator, while providing infrastructure for GPU cluster management and model serving.

The platform is distinguished by its ability to handle hybrid-cloud compute scheduling and fractional GPU allocation, allowing multiple workloads to share a single hardware accelerator. It employs a metadata-based approach to data versioning, using virtual views to track large datasets and artifacts without duplicating raw files.

The system covers a broad range of capabilities including automated machine learning pipeline orchestration via task-graph dependencies, hyperparameter optimization, and distributed model training. It also provides an integrated AI workbench for remote development and a centralized control plane for tracking models from training through to production deployment.

Governance and observability are integrated through multi-tenant resource isolation, role-based access control, and real-time monitoring of compute resources and model performance.
- [module-federation/vite](https://awesome-repositories.com/repository/module-federation-vite.md) (789 ⭐) — Vite Plugin for Module Federation
- [eugeneyan/applied-ml](https://awesome-repositories.com/repository/eugeneyan-applied-ml.md) (29,783 ⭐) — This project is a comprehensive, curated knowledge base designed to support the development and maintenance of production-grade machine learning systems. It serves as a centralized repository of industry-standard technical literature, engineering case studies, and research papers, providing a structured reference for practitioners navigating the complexities of modern data science and machine learning engineering.

The resource distinguishes itself through a cross-domain approach that bridges the gap between academic research and practical implementation. By synthesizing proven industry architectures and operational strategies, it offers a unified framework for managing the entire machine learning lifecycle, from initial data infrastructure and pipeline development to model deployment, versioning, and continuous monitoring.

The collection covers a broad spectrum of technical domains, including data quality management, feature engineering, and the application of various machine learning tasks such as natural language processing, computer vision, and reinforcement learning. It also addresses critical operational concerns like system efficiency, privacy-preserving techniques, and the ethical considerations inherent in automated decision-making systems.

The repository is maintained through a community-driven model, ensuring that the documentation remains aligned with evolving industry standards. All content is delivered via static markdown files, providing a highly accessible and version-controlled format for long-form technical research.
- [apollographql/federation-jvm](https://awesome-repositories.com/repository/apollographql-federation-jvm.md) (273 ⭐) — JVM support for Apollo Federation
- [anthropics/claude-code](https://awesome-repositories.com/repository/anthropics-claude-code.md) (132,728 ⭐) — Anthropic's terminal-native AI coding agent.
- [crewaiinc/crewai](https://awesome-repositories.com/repository/crewaiinc-crewai.md) (53,687 ⭐) — CrewAI is a multi-agent orchestration framework designed for building autonomous systems that execute complex, multi-step workflows. It provides a development platform where specialized agents are defined with specific roles, goals, and tool sets to perform tasks collaboratively. By leveraging a declarative workflow engine, the system manages task dependencies, state transitions, and execution logic, allowing for the creation of structured, stateful sequences of operations.

The framework distinguishes itself through its hierarchical management capabilities, which utilize manager agents to coordinate specialist teams, delegate tasks, and oversee project execution. It incorporates a persistent memory architecture that enables agents to retain context and perform semantic searches across long-running operations. Furthermore, the system supports robust production-ready applications by enforcing schema-based output validation and providing execution checkpointing, which allows for mid-flight resumption and the replaying of specific tasks to debug or refine processes.

Beyond its core orchestration, the project offers a comprehensive suite of developer utilities for managing agent performance and workflow reliability. This includes tools for training agents through iterative cycles, monitoring system events via a central execution bus, and visualizing workflow structures. The platform also features a provider-agnostic interface for integrating external APIs and utilities, ensuring that agents can interact with diverse real-world services while maintaining consistent data structures throughout the execution lifecycle.
- [trainindata/deploying-machine-learning-models](https://awesome-repositories.com/repository/trainindata-deploying-machine-learning-models.md) (0 ⭐) — Accompanying repo for the online course Deployment of Machine Learning Models.
- [secretflow/secretflow](https://awesome-repositories.com/repository/secretflow-secretflow.md) (2,629 ⭐) — SecretFlow is a privacy computing framework and platform designed for secure multi-party computation, federated learning, and privacy-preserving data analysis across independent nodes. It provides a management system to coordinate secure workloads and cryptographic tasks across a distributed cluster.

The project enables joint data analysis and machine learning on partitioned datasets using cryptographic protocols. It allows for the training of models and the execution of analytical queries across multiple parties without exposing raw source information to any single participant.

The framework covers a broad surface of privacy-preserving capabilities, including secure distributed analytics, encrypted data processing, and distributed model development. It incorporates orchestration tools for managing private workflows and coordinating the sequence of computation steps across isolated environments.
- [ammar64/sharing](https://awesome-repositories.com/repository/ammar64-sharing.md) (0 ⭐) — Share files and apps over HTTP. You need the other device to be connected to the same network. just toggle on the server and scan the QR Code on other device and you're good to go. Files sent from browser to the app can be found in Sharing/ folder in your internal storage. You can always disable…
- [lukasmasuch/best-of-ml-python](https://awesome-repositories.com/repository/lukasmasuch-best-of-ml-python.md) (23,236 ⭐) — This project serves as a comprehensive, community-driven directory of high-quality open-source Python libraries and tools for machine learning, data science, and artificial intelligence. It functions as a centralized resource for developers to discover, evaluate, and track the maintenance status of software packages across the entire machine learning ecosystem.

The platform distinguishes itself through automated popularity tracking and data-driven content curation, which programmatically validate and rank projects based on community activity and development velocity. By organizing these tools into a hierarchical, metadata-driven structure, it simplifies the navigation of complex technical domains, ranging from foundational model development and experiment tracking to specialized fields like reinforcement learning, computer vision, and natural language processing.

The directory covers a broad capability surface, including infrastructure for distributed computing, hardware acceleration, and model deployment. It also catalogs specialized tools for processing diverse data types such as audio, geospatial, medical, and graph-structured information, as well as frameworks for statistical analysis, privacy-preserving machine learning, and adversarial robustness.

All project information is maintained within a version-controlled repository, which powers a static site generation process to provide a searchable and transparent knowledge base for the community.
- [dotnet/efcore](https://awesome-repositories.com/repository/dotnet-efcore.md) (14,587 ⭐) — Entity Framework Core is an object-relational mapper that enables developers to interact with database systems using strongly-typed code. It serves as a comprehensive data access framework, providing a unified interface for mapping application objects to relational and non-relational database schemas while managing the lifecycle of data operations through a central context.

The project distinguishes itself through a provider-based architecture that decouples core data access logic from specific database engines, allowing for consistent interaction across diverse storage systems. It features a sophisticated query translation engine that converts language-integrated queries into optimized, database-specific commands, alongside a robust migration toolset that automates schema evolution by synchronizing the physical database structure with the application model.

Beyond its core mapping and query capabilities, the framework provides extensive tooling for database scaffolding, reverse engineering, and automated code generation. It supports complex data modeling requirements, including inheritance hierarchies, owned entity relationships, and custom mapping configurations, while offering built-in mechanisms for transaction management, concurrency control, and connection resiliency.

The framework includes comprehensive observability and testing utilities, such as command interception, operation logging, and in-memory database simulation for isolated testing. It is designed for integration with standard dependency injection containers and provides configuration hooks to customize scaffolding and migration logic.
- [eleutherai/gpt-neo](https://awesome-repositories.com/repository/eleutherai-gpt-neo.md) (8,275 ⭐) — GPT-Neo is an open-source distributed training framework designed for scaling GPT-2 and GPT-3-style language models across multiple devices using mesh-tensorflow for model parallelism. It provides the infrastructure to train transformer-based language models with billions of parameters across distributed computing environments, making large-scale language model research accessible outside of proprietary systems.

The framework supports training both autoregressive GPT-style models and masked language models like BERT or RoBERTa, with configurable masking strategies and token handling. It includes capabilities for fine-tuning models through reinforcement learning from human feedback, enabling alignment of model outputs with human preferences. For evaluation, GPT-Neo provides standardized benchmarking tools with contamination detection to ensure reproducible and transparent assessment of language model performance.

Beyond training and evaluation, the project encompasses interpretability research tools for analyzing internal representations across transformer layers, including techniques for behavior attribution, concept erasure, and latent knowledge elicitation. It also supports multimodal data processing to extend language model research into image and audio domains. The framework implements memory-efficient training techniques such as gradient checkpointing, mixed-precision arithmetic, and dynamic batching to maximize hardware utilization during large-scale training runs.
- [beavailable/share](https://awesome-repositories.com/repository/beavailable-share.md) (49 ⭐) — Share and receive files effortlessly over HTTP
- [lexfridman/mit-deep-learning](https://awesome-repositories.com/repository/lexfridman-mit-deep-learning.md) (10,417 ⭐) — This project is a collection of deep learning courseware and instructional materials. It provides a structured curriculum and practical demonstrations covering the fundamentals of neural network architectures and artificial intelligence.

The materials include specialized tutorials and guides on generative adversarial networks for synthetic data generation, as well as reinforcement learning resources focused on decision-making and motion planning for autonomous robotics.

The content covers broad capability areas including computer vision development, the implementation of feed-forward and convolutional networks, and the analysis of autonomous vehicle systems. It also addresses advanced research topics such as privacy-preserving computation and semantic video frame segmentation.

The project is delivered primarily through Jupyter Notebooks.
- [edumserrano/webpack-module-federation-with-angular](https://awesome-repositories.com/repository/edumserrano-webpack-module-federation-with-angular.md) (31 ⭐) — Guide to learn about Webpack Module Federation with several Angular code demos
- [nvidia/isaac-gr00t](https://awesome-repositories.com/repository/nvidia-isaac-gr00t.md) (6,222 ⭐)
- [originjs/vite-plugin-federation](https://awesome-repositories.com/repository/originjs-vite-plugin-federation.md) (3,026 ⭐) — Module Federation for vite & rollup
- [openmined/pysyft](https://awesome-repositories.com/repository/openmined-pysyft.md) (9,907 ⭐) — PySyft is a privacy-preserving machine learning framework and remote computation engine. It functions as a decentralized data analysis orchestrator that allows for the execution of data science workflows on remote servers without requiring the transfer of raw private data from the host device.

The platform provides a secure collaboration environment where data owners manage permissions and authorize specific collaborators to run computations. It differentiates its workflow by utilizing mock data for local development and validation before submitting final analysis jobs to private remote servers.

The system covers a broad range of secure computation capabilities, including the use of sandboxed job execution to isolate computations from the underlying system and a cloud-storage transport layer for exchanging requests between peers. It also includes mechanisms for asynchronous state synchronization to maintain consistency across offline or cloud-connected nodes.
- [datajuicer/data-juicer](https://awesome-repositories.com/repository/datajuicer-data-juicer.md) (6,574 ⭐) — Data-Juicer is an open-source framework for cleaning, filtering, deduplicating, and transforming multimodal datasets to prepare them for training large language and vision models. It functions as a distributed data pipeline engine that runs processing jobs across Ray clusters, handling billions of samples with automatic operator fusion and adaptive parallelism. The framework provides a library of operators that leverage large language models for semantic extraction, filtering, and data synthesis within processing pipelines.

The project distinguishes itself through a YAML-based data recipe system for composing reproducible, version-controlled data workflows that can be shared and reused across environments. It includes a configurable quality gate system, lazy dependency injection for operator-specific packages, and a multimodal operator registry that provides a unified interface for text, image, audio, and video operators within a single pipeline. The operator-fusion pipeline compiler automatically merges adjacent data operators into fused execution units to reduce I/O and scheduling overhead, while sample-level lineage tracing records the origin and transformation history of each sample for auditability.

The framework covers data cleaning and deduplication across distributed clusters, image, line-level, record-level, text, and video deduplication methods. It provides data filtering and selection based on audio, image, LLM, multimodal, quality, sample selection, and text criteria. Data processing and transformation capabilities span agent data preparation, audio processing, batch aggregation, dataset enhancement, mixing, repartitioning, domain-specific processing, field transformation, foundation model curation, image processing, language splitting, LLM operators, multimodal processing, question-answer calibration, synthetic data generation, text processing, and video data processing for embodied AI. The project also includes data quality and analysis tools for dataset profiling, visualization, and model evaluation, as well as RAG index building by extracting, normalizing, chunking, deduplicating, and profiling content for retrieval-augmented generation systems.

Documentation and support are available through a Q&A copilot integrated into documentation and chat platforms.
- [src-d/models](https://awesome-repositories.com/repository/src-d-models.md) (19 ⭐) — Machine learning models for MLonCode trained using the source{d} stack
- [google-research/google-research](https://awesome-repositories.com/repository/google-research-google-research.md) (38,139 ⭐) — This repository serves as a comprehensive research platform and toolkit for advancing machine learning, quantum computing, and large-scale scientific data analysis. It provides foundational frameworks for developing complex algorithmic systems, offering the necessary infrastructure for distributed training, computational graph execution, and high-performance model development.

The project distinguishes itself by integrating specialized research domains with robust, privacy-preserving methodologies. It supports diverse scientific discovery through tools for quantum simulation, physics-informed neural modeling, and secure data aggregation. Beyond core machine learning, the platform facilitates advanced research in fields such as genomics, environmental forecasting, and clinical health diagnostics, enabling researchers to apply deep learning to complex, real-world datasets.

The repository encompasses a broad capability surface, including automated research tooling, natural language processing, and machine perception. It provides infrastructure for monitoring model performance, benchmarking factuality, and ensuring responsible artificial intelligence through fairness and robustness evaluations. These tools are designed to support experimental workflows, from hypothesis generation and scientific code synthesis to the deployment of energy-efficient models on edge hardware.
- [anomalyco/opencode](https://awesome-repositories.com/repository/anomalyco-opencode.md) (175,152 ⭐) — OpenCode is a framework for orchestrating autonomous AI agents within development environments. It provides a multi-tiered architecture where primary assistants manage user interaction while specialized subagents handle specific tasks like planning, research, and code generation. The system includes a comprehensive command-line interface for managing these workflows, configuring agent behavior, and defining custom tools or commands through metadata-rich files.

The platform features a modular plugin system and extensive integration support, including standardized protocols for connecting local and remote tool servers. It incorporates a security-focused architecture with granular permission controls, allowing users to define access policies for file operations, shell commands, and web access. These security measures are complemented by enterprise-grade infrastructure options, such as centralized authentication and private registry integration.

For developers, the project offers a type-safe SDK for building custom integrations and a RESTful API for programmatic system management. Configuration is handled through a schema-validated system that supports variable injection and multi-file organization. The interface is fully customizable, featuring a theme system for terminal displays and interactive commands for managing model selection and session history.
- [vmware/data-annotator-for-machine-learning](https://awesome-repositories.com/repository/vmware-data-annotator-for-machine-learning.md) (0 ⭐) — Data Annotator for Machine Learning
- [zhaochenyang20/awesome-ml-sys-tutorial](https://awesome-repositories.com/repository/zhaochenyang20-awesome-ml-sys-tutorial.md) (5,371 ⭐) — This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters.

The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static graph kernel capture. These capabilities are complemented by advanced inference optimizations, including speculative decoding, memory-efficient activation offloading, and tree-structured key-value cache prefix sharing, which collectively enable efficient model execution and resource management.

Beyond core training and inference, the project details a broad capability surface for managing agentic workflows and multimodal architectures. This includes automated reinforcement learning pipelines, structured grammar-based decoding for constrained output, and sophisticated traffic management for distributed request scheduling. The framework also provides extensive tooling for system observability, performance profiling, and hardware-aware resource allocation to ensure stability and efficiency in production environments.
- [facebookresearch/audiocraft](https://awesome-repositories.com/repository/facebookresearch-audiocraft.md) (23,379 ⭐) — Audiocraft is a deep learning audio library and machine learning framework designed for training, fine-tuning, and evaluating generative models for music and sound effects. It functions as a text-to-music generative model and a neural audio codec, providing the tools necessary to compress audio signals into discrete representations and synthesize high-fidelity waveforms from textual descriptions.

The framework is distinguished by its ability to combine multiple conditioning signals, allowing for the generation of audio based on text prompts, melodic excerpts, or style-based audio clips. It also includes a specialized audio watermarking tool for embedding and detecting invisible markers within signals to protect ownership and track content origins.

The project covers a broad range of capabilities, including neural audio compression, audio data augmentation, and the execution of complex training pipelines for diffusion and masked audio models. It provides utilities for model lifecycle management, such as checkpoint exporting and experiment tracking, alongside evaluation metrics for measuring signal fidelity and perceptual quality.
- [fushuhao6/attack-resistant-federated-learning](https://awesome-repositories.com/repository/fushuhao6-attack-resistant-federated-learning.md) (0 ⭐) — This repository is implemented by Shuhao Fu and Chulin Xie.