Autogluon

AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning.

The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inference and forecasting using pretrained foundation models, alongside parameter-efficient finetuning techniques to adapt large models to specific tasks.

Its broader capabilities include automated model selection and ensembling via bagging and stacking, as well as comprehensive computer vision pipelines for object detection and semantic segmentation. The framework also covers probabilistic time series forecasting, named entity recognition for natural language processing, and semantic search based on embedding extraction.

The system provides utilities for deploying trained predictors as cloud endpoints or serverless functions and offers hardware acceleration through ONNX and TensorRT.

Features

End-to-End Training Pipelines - Automates the end-to-end machine learning pipeline to build high-accuracy models for diverse data types.

Automated ML (AutoML) - Provides an end-to-end automated pipeline for selecting, tuning, and deploying high-accuracy models across multiple data types.

Time Series Forecasting - Provides a complete framework for predicting future values and probabilistic trends for multiple univariate sequences.

Automated Machine Learning - Automates the end-to-end pipeline from data preprocessing to high-accuracy model training and validation.

Pretrained Model Integrations - Provides utilities to execute pretrained foundation models for immediate zero-shot predictions without custom training.

Ensemble Methods - Implements multi-layer stacking to combine predictions from base models as features for higher-level meta-models.

Feature Extraction - Converts raw image or text data into high-dimensional feature vectors using zero-shot prediction for downstream tasks.

Regression Models - Predicts continuous numerical values from text, images, and tabular inputs and evaluates them using error metrics.

Computer Vision Pipelines - Provides automated workflows for object detection, semantic segmentation, and image classification.

Data Preprocessing - Internally handles preprocessing for text and image data to prepare raw inputs for model training.

Model Fine-Tuning - Supports adapting pre-trained models to specific downstream tasks and datasets to improve domain-specific accuracy.

Model Architecture Selection - Automatically specifies the most appropriate pretrained backbones or MLP types for various data modalities.

Model Ensembling - Combines user-defined models with automated models to build multi-layer stack ensembles for improved performance.

Finetuning Workflows - Adapts large pretrained foundation models to specific tasks using parameter-efficient finetuning.

Model Performance Selection - Automatically selects the best local, global, and ensemble models based on quality presets or time limits.

Multimodal Embedding Models - Converts raw multimodal samples into shared numerical embedding vectors for downstream predictive tasks.

Multimodal Integration Libraries - Provides a library for training and deploying models that integrate tabular, image, text, and time series data.

Multimodal Machine Learning - Predicts discrete labels or class probabilities for a set of combined multimodal input features.

Multimodal Models - Integrates diverse data types by embedding images and text into a shared vector space for joint predictive modeling.

Multimodal Training - Implements training workflows for neural networks that embed and fuse different data modalities into a single model.

Parameter Efficient Fine-Tuning - Implements memory-efficient adaptation techniques such as LoRA, IA3, and BitFit by updating only a small subset of model parameters.

Cross-Modal Similarity Scoring - Determines semantic similarity between different data modalities, including text-to-text, image-to-image, and image-to-text.

Tabular Feature Engineering - Converts raw data into model-ready formats by encoding categories and extracting datetime components.

Text Model Training - Trains high-quality models for text data using various backbones and supports cross-lingual transfer.

Automated Model Selection - Fits, tunes, and selects the best forecasting models for a panel of time series based on a prediction horizon.

Probabilistic Forecasting - Produces multi-step ahead probabilistic predictions providing value ranges and uncertainty intervals for univariate time series.

Zero-Shot Forecasting - Uses pretrained large-scale time series models to generate forecasts on new data without requiring training.

Vector Embeddings - Generates high-dimensional vector representations of text and images to compute semantic similarity.

Zero-Shot Inference - Executes predictions across various modalities using pretrained foundation models without requiring task-specific training data.

Foundation Model Adaptation - Provides methods to adapt large pre-trained foundation models using parameter-efficient techniques like IA^3 and BitFit.

Tabular Predictive Models - Implements automated predictive modeling for structured tabular data to forecast values or categories.

Batch Inference Engines - Uses a dependency graph to coordinate predictions across multiple models, eliminating redundant computations and increasing throughput.

Batch Prediction Processing - Produces prediction probabilities from multiple models simultaneously using a dependency graph.

Class Probability Estimation - Calculates the likelihood of each possible class for a given input for granular analysis.

Image Augmentation - Applies geometric and color transformations to image data to improve model generalization.

Image Classification Models - Develops image classification and regression models, including support for zero-shot classification using pretrained models.

Object Detection - Identifies and localizes objects within images using bounding boxes via both zero-shot and trained models.

Automated Training - Automates the end-to-end training of high-quality object detection models using COCO-formatted data.

Segmentation Model Training - Trains models to perform pixel-level semantic image segmentation for detailed visual understanding.

Continuous Model Training - Updates models with new data over time to prevent performance decay and adapt to changing patterns.

Custom Model Integrations - Wraps external time series models into a compatible interface for use within the automated forecasting pipeline.

Forecasting Input Preparation - Formats dataset splits into historical data and future covariates to prepare for forecasting.

Decision Threshold Calibration - Implements techniques for converting continuous probability estimates into discrete class labels using optimized thresholds for binary classification.

Detection Accuracy Metrics - Calculates object detection accuracy using standard COCO and VOC mean Average Precision (mAP) metrics.

Distributed Training - Executes training across multiple GPUs or nodes using data parallel and distributed strategies.

Bagging Ensembles - Fits base models across multiple data splits using bagging to improve model stability and reduce variance.

Stacking Ensembles - Implements multi-layer stacking by using base model predictions as features for subsequent model layers.

Weighted Ensembles - Combines multiple models using optimized weights via ensemble selection for final predictions.

Feature Contribution Analysis - Identifies key model drivers by quantifying feature impact through permutation shuffling.

Feature Engineering - Implements manual data transformations and custom feature generators that execute during both training and inference.

Custom Feature Pipelines - Defines custom sequences of feature generators to control how specific data types are handled and transformed.

Text Embedding Extraction - Converts groups of sentences into vector representations for similarity and retrieval tasks.

Feature Importance Attribution - Quantifies the impact of input features on predictive accuracy using value shuffling.

Few-Shot Learning Frameworks - Provides a framework for training classification models on multimodal data using minimal examples.

Few-Shot Learning Mechanisms - Combines foundation models with support vector machines to implement few-shot learning.

Tabular Foundation Model Application - Leverages pre-trained foundation models to improve predictive performance on small tabular datasets.

Image-Text Ranking - Measures and ranks visual-semantic similarity between images and text to identify the most relevant matches.

Knowledge Distillation - Trains small student models to mimic complex teacher ensembles for more efficient production deployment.

Covariate Integration - Improves forecast accuracy by integrating static features and known future covariates into the models.

Image Encoder Embedding Extractions - Converts images into feature vectors to enable the calculation of semantic similarity scores.

GPU Training Accelerators - Accelerates model training by utilizing parallelization strategies across GPUs.

Model Inference Accelerators - Integrates TensorRT as a model inference accelerator to increase prediction speed and reduce deployment latency.

ONNX Model Exporters - Provides the ability to convert machine learning models into the standardized ONNX format to increase prediction throughput.

Model Comparison Interfaces - Provides summary tables comparing trained models across validation scores, training times, and inference speeds.

Model Evaluation Metrics - Allows the definition of specific evaluation criteria to guide model selection and optimization.

Inference Optimization - Improves prediction speed by persisting models in memory or refitting ensembles into single models.

Text Classification - Trains models to perform binary classification, multiclass classification, or regression on datasets with text fields.

Model Interpretability - Constructs interpretable glass-box generalized additive models with automatic interaction detection.

Foundation Model Ensembles - Combines multiple foundation models in a single predictor through stacking and ensembling.

Deployment Optimizations - Refines predictor artifacts to improve inference speed and resource efficiency for production environments.

Model Prediction Evaluation - Compares predicted probabilities and labels against ground truth data to generate comprehensive performance reports.

Image Labeling Engines - Generates class predictions or probability distributions for images provided as paths or byte arrays.

Model Quality Presets - Provides predefined quality presets to balance prediction accuracy against training speed and inference latency.

Forecast Accuracy Validation - Evaluates time series forecast precision by comparing predictions against held-out historical data using error metrics.

Model Training Optimizers - Tunes hyperparameters, early-stopping, and ensemble weights to maximize evaluation metrics.

Model Exporting - Saves model weights and configuration files to local directories for later reloading and deployment.

Training Resumption - Allows resuming model training from an existing checkpoint to improve performance without restarting.

Out-of-Fold Predictions - Implements out-of-fold prediction strategies to ensure unbiased performance estimation during training.

Multi-Target Learning - Trains models to predict several target variables simultaneously instead of a single output.

Multilingual NLP - Finetunes transformer models and uses multilingual presets to process text across different languages.

Multimodal Entity Extraction - Identifies and classifies specific entities within datasets containing both text and image information.

Named Entity Recognition - Identifies and classifies key entities such as people and organizations within text strings across multiple languages.

NER Performance Evaluators - Measures the accuracy of named entity recognition using standard recall, precision, and F1 scores on test datasets.

Model Finetuning - Adjusts pretrained object detection models using custom COCO-format datasets to improve task-specific accuracy.

Predictor Snapshotting - Allows saving the predictor state to a directory to enable loading without retraining.

Semantic Similarity Calculation - Determines conceptual similarity between pairs of text, images, or mixed-modal inputs to enhance search ranking.

Semantic Search - Ranks documents by calculating cosine similarities between query and document embeddings.

Semantic Segmentation - Identifies and outlines specific objects within images by predicting a class mask for every pixel.

Temporal Train-Test Splitting - Divides datasets into training and testing sets based on a specified prediction length for evaluation.

Time Series Backtesting - Estimates forecast quality by evaluating models across multiple historical cutoff points to simulate real-world deployment scenarios.

Dynamic Covariate Integration - Uses external dynamic features, including known future events, to improve forecast accuracy.

Forecast Aggregation - Aggregates predictions from multiple base models using weighted averages or gradient-based optimization.

Forecast Evaluation - Measures time series prediction accuracy using built-in metrics like MAE and RMSE or custom scoring logic.

Statistical Forecasting - Captures trends and seasonality using automated ARIMA and exponential smoothing models.

Time Series Backtesting - Generates forecasts across multiple sliding or expanding validation windows to evaluate model performance over time.

Training Curve Analysis - Visualizes performance metrics recorded during training to diagnose model behavior and accuracy improvement over time.

Training Optimizations - Controls total training time via limits, presets, and data subsampling to optimize resource consumption.

Vector Similarity Search - Determines semantic similarity between images by projecting them into high-dimensional vector embeddings.

Zero and Few-Shot Learning - Classifies new categories based on a very small number of provided examples per class.

Zero-Shot Classification Models - Categorizes images into previously unseen classes by leveraging pretrained vision-language models.

Zero-Shot Classification Systems - Identifies image contents by comparing visual features against natural language text categories without training data.

Model Serving & Deployment - Packages trained predictors into containers or serverless functions for real-time or batch inference.

Text Matching - Trains models to determine if two sentences share the same meaning by projecting them into high-dimensional vectors.

Multi-Label Classifiers - Predicts multiple target labels for a single input, supporting both mutually and non-mutually exclusive tags.

Data Cleaning Procedures - Removes redundant information by dropping duplicate columns or columns with a single unique value.

Foundation Model Training - Trains specialized pretrained foundation models like Mitra and TabPFNv2 for tabular classification and regression.

Static Metadata Integration - Adds time-independent attributes like location or product IDs to help models group similar entities.

Feature Generation - Transforms forecasting tasks into tabular regression problems by generating lag and time features.

Cloud Deployment - Hosts trained models as persistent endpoints or batch transform jobs in cloud environments.

Serverless Deployment - Packages trained models into container images for execution in serverless environments like AWS Lambda.

Large Dataset Optimizations - Performs inference on files exceeding system memory by chunking data into smaller sizes.

Tabular Data Type Inference - Automatically detects if columns are numerical, categorical, boolean, datetime, or text based on values.

Machine Learning Libraries - Automated machine learning for tabular and multimodal data.

Automated Machine Learning - Automated feature and model selection for tabular, image, and text data.

autogluonautogluon

Features

Star history