LimiX

LimiX is a tabular foundation model and a suite of tools for structured data, providing a transformer-based system for classification, regression, and data generation. It includes a causal inference engine to determine cause-and-effect relationships, a synthetic data generator, and a framework for filling missing dataset values through feature context prediction.

The project optimizes tabular inference through a high-performance system that uses ensemble-based sample retrieval to increase prediction speed and accuracy on high-specification hardware. It further distinguishes itself by using transformer-based encoding and masked-feature pretraining to learn data distributions.

The system covers a broad range of analytical capabilities, including high-dimensional vector embedding for categorical separation and the creation of synthetic samples via causal-graph data generation. Its predictive surface extends to specific applications such as electricity market price forecasting and the analysis of molecular properties in organic molecules.

Features

Tabular Foundation Model Application - Provides a pre-trained transformer foundation model specifically designed for tabular classification and regression tasks.

Tabular Predictive Models - Provides a transformer-based foundation model for classification and regression tasks across diverse structured datasets.

Causal Inference Tools - Implements a system for determining cause-and-effect relationships and encoding dependencies between structured variables.

Tabular Inference Runtimes - Provides an ensemble retrieval approach to accelerate prediction speeds and accuracy on high-specification hardware.

Masked-Feature Pretraining - Learns data distributions by training the model to reconstruct randomly hidden feature values.

Tabular Feature Embeddings - Maps categorical and numerical features into a shared latent space to enable effective separation of data classes.

Tabular Tokenization Encoders - Converts structured data rows into embeddings by treating features as tokens within a transformer architecture.

Ensemble Retrieval Accelerators - Increases prediction speed using an ensemble retrieval approach to maximize high-specification hardware performance.

Ensemble Retrieval Optimizers - Accelerates tabular inference by retrieving and combining similar historical samples to refine final predictions.

Tabular Embedding Extraction - Extracts high-dimensional vector representations from tabular data models for downstream categorical analysis.

Tabular Inference Optimizers - Reduces prediction time and increases accuracy using ensemble retrieval methods on high-performance hardware.

Sample Retrieval Optimizers - Increases prediction precision using an optimized retrieval system that identifies salient patterns in key samples.

Causally Constrained Data Generators - Creates synthetic samples using Directed Acyclic Graphs to ensure generated data respects known causal relationships.

Synthetic Data Generation - Generates synthetic data samples using feature masking or causal dependency graphs to preserve distributions.

Tabular - Produces realistic synthetic tabular samples by learning the underlying distribution of the input dataset.

Missing Data Imputation - Fills missing dataset cells by predicting hidden entries based on the observed context of other features.

limix-ldm-aiLimiX

Features

Star history