30 open-source projects similar to asavinov/lambdo, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Lambdo alternative.
Curated set of transformers that make your work with steppy faster and more effective :telescope:
🛠 All-in-one web-based IDE specialized for machine learning and data science.
🏕️ Reproducible development environment for humans and agents
The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo.
Template repository for data science lifecycle project
♾️ CML - Continuous Machine Learning | CI/CD for ML
DVC is a data versioning tool and pipeline orchestrator designed to track large datasets and machine learning models. It functions as a system for managing large data artifacts by storing lightweight metadata in version control while keeping the actual binaries in a separate cache. The project serves as an experiment tracker and remote storage synchronizer, enabling the execution and comparison of machine learning iterations based on hyperparameters and performance metrics. It provides a bridge for pushing and pulling these large data artifacts between local environments and cloud or on-premi
🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
Comet LLM is an observability platform and evaluation framework designed for large language model applications and agentic workflows. It functions as a system for tracing, monitoring, and debugging execution flows while providing tools for prompt optimization and the enforcement of AI safety guardrails. The platform distinguishes itself through a combination of model-based scoring and heuristic metrics to quantify output quality and detect hallucinations. It includes a dedicated prompt and agent optimizer with an interactive playground for refining templates and tool configurations. For retri
Examples of Machine Learning code using Comet.ml
Lightweight, Python library for fast and reproducible experimentation :microscope:
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
A general purpose recommender metrics library for fair evaluation.
AutoGluon is an automated machine learning framework designed to optimize model selection and hyperparameter tuning across tabular, text, image, and time series data. It functions as an ensemble learning library and a tabular data prediction engine, aiming to build high-accuracy predictive models without manual algorithm selection. The framework integrates multimodal machine learning pipelines that combine disparate data types into a single representation using specialized encoders. It also includes a probabilistic time series forecaster that fits multiple statistical and deep learning models
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Albumentations is a computer vision image augmentation library designed to increase training data diversity for deep learning models. It provides a toolset for applying geometric and color transformations to images and annotations, including a specialized collection of 3D operations for volumetric data used in medical and scientific imaging. The library functions as an image mask and bounding box transformer, automatically updating masks, bounding boxes, and keypoints when images undergo geometric changes. This ensures that spatial alterations remain synchronized across images and their assoc
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).
Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
Hopsworks - Data-Intensive AI platform with a Feature Store
Cleanlab is a data-centric AI library and toolkit designed to improve machine learning model performance by detecting label errors and increasing overall dataset quality. It implements a confident learning framework that iteratively refines label noise estimates by comparing model predictions with estimated label probabilities to identify mislabeled examples. The project provides specialized utilities for active learning optimization, allowing for the selection of the most impactful examples for labeling or re-labeling. It also includes an outlier detection tool to identify atypical data poin
Featuretools is an automated feature engineering library and data transformation framework written in Python. It automatically generates machine learning feature vectors from multi-table datasets by applying synthesis patterns to relational and timestamped data. The system functions as a distributed feature synthesis engine, allowing the process of creating feature vectors to scale across multiple cores or clusters to handle large-scale datasets. The library supports the synthesis of multi-table datasets, time series feature generation, and the creation of custom machine learning primitives
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Gridstudio is a web-based data science integrated development environment that combines a programmatic spreadsheet interface with an interactive Python environment. It functions as a system for organizing and deploying isolated data workspaces to handle data science tasks and storage. The platform merges spreadsheet data management with an execution engine for formulas and Python code, allowing for programmatic spreadsheet manipulation. It enables users to run interactive scripts and terminal sessions to clean, transform, and manage datasets within a browser. The environment supports Linux s
Library for animated data visualizations and data stories.