30 open-source projects similar to handcraftsman/geneticalgorithmswithpython, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best GeneticAlgorithmsWithPython alternative.
This repository provides a curated collection of weekly datasets designed for data visualization practice, data science education, and statistical analysis. It serves as a central source for cleaned and structured real-world data, allowing practitioners to focus on analysis and visualization without the need to scrape or clean raw files. The project facilitates a community learning workflow where users can explore a wide variety of topics, ranging from global health spending and energy datasets to maritime logs and baby name popularity. Participants are encouraged to share their resulting vis
🐍 Quick reference guide to common patterns & functions in PySpark.
This project is a curated collection of technical reference materials and study guides designed for machine learning interview preparation. It provides comprehensive resources for candidates pursuing engineering roles, focusing on deep learning, production infrastructure, and large-scale system design. The repository distinguishes itself through an architecture that combines theoretical research with industrial case studies. It utilizes a pattern-based approach to system design, breaking down complex deployments—such as recommendation engines, search ranking, and ad click prediction—into reus
Slides, scripts and materials for the Machine Learning in Finance Course at NYU Tandon, 2022
Ways of doing Data Science Engineering and Machine Learning in R and Python
splearn: package for signal processing and machine learning with Python. Contains tutorials on understanding and applying signal processing.
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Featuretools is an automated feature engineering library and data transformation framework written in Python. It automatically generates machine learning feature vectors from multi-table datasets by applying synthesis patterns to relational and timestamped data. The system functions as a distributed feature synthesis engine, allowing the process of creating feature vectors to scale across multiple cores or clusters to handle large-scale datasets. The library supports the synthesis of multi-table datasets, time series feature generation, and the creation of custom machine learning primitives
A list of colleges and universities offering degrees in data science.
Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
Train and Deploy an ML REST API to predict crypto prices, in 10 steps
Cleanlab is a data-centric AI library and toolkit designed to improve machine learning model performance by detecting label errors and increasing overall dataset quality. It implements a confident learning framework that iteratively refines label noise estimates by comparing model predictions with estimated label probabilities to identify mislabeled examples. The project provides specialized utilities for active learning optimization, allowing for the selection of the most impactful examples for labeling or re-labeling. It also includes an outlier detection tool to identify atypical data poin
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
A general purpose recommender metrics library for fair evaluation.
This project is a data science curriculum and instructional syllabus designed to teach the fundamental principles and tools of the field. It provides a structured set of learning materials, including R programming courseware and guides for statistical learning. The materials focus on the practical application of data science, covering data cleaning, visualization, and exploratory data analysis. It includes resources for mastering specific techniques such as linear regression, classification, and unsupervised learning. The curriculum is organized into a modular sequence of educational modules
A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
Albumentations is a computer vision image augmentation library designed to increase training data diversity for deep learning models. It provides a toolset for applying geometric and color transformations to images and annotations, including a specialized collection of 3D operations for volumetric data used in medical and scientific imaging. The library functions as an image mask and bounding box transformer, automatically updating masks, bounding boxes, and keypoints when images undergo geometric changes. This ensures that spatial alterations remain synchronized across images and their assoc
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
This project is a Python education repository and programming tutorial designed to teach language fundamentals, from basic syntax and variables to advanced concepts. It serves as a data science starter kit and a guide for REST API integration. The repository provides instructional scripts and sample code covering object-oriented programming patterns and asynchronous programming. It includes practical demonstrations for fetching and processing JSON data from external web services using HTTP requests. The materials cover a broad capability surface including data analysis workflows with interac
This project is an infrastructure platform designed to provide secure, isolated, and ephemeral cloud-based Linux environments for AI agents and automated code execution. It functions as an orchestrator that provisions on-demand virtual machines, allowing developers to run arbitrary code generated by large language models within hardware-level security boundaries. The platform distinguishes itself through its ability to manage stateful, long-lived sessions that persist across multiple execution calls, enabling complex, multi-step workflows. It supports high-concurrency scaling, allowing for th
Examples of Machine Learning code using Comet.ml
DVC is a data versioning tool and pipeline orchestrator designed to track large datasets and machine learning models. It functions as a system for managing large data artifacts by storing lightweight metadata in version control while keeping the actual binaries in a separate cache. The project serves as an experiment tracker and remote storage synchronizer, enabling the execution and comparison of machine learning iterations based on hyperparameters and performance metrics. It provides a bridge for pushing and pulling these large data artifacts between local environments and cloud or on-premi
🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
CML is a pipeline automation tool for training and evaluating machine learning models, functioning as a CI/CD system for machine learning. It serves as a cloud compute orchestrator and Git-based workflow manager that automates model training cycles through branch management, automated commits, and integrated reporting. The project distinguishes itself by provisioning ephemeral cloud instances or Kubernetes nodes to provide specialized hardware for compute-heavy tasks. It also manages remote compute runners, allowing the connection of self-hosted GPU clusters or on-premise machines to execute