Lance is a versioned columnar data format and storage engine designed as a multimodal AI lakehouse. It serves as a vector database storage engine and a cloud object store dataset manager, organizing images, video, audio, and embeddings into a unified format optimized for machine learning workflows.
The project distinguishes itself by combining a columnar layout for structured data with a specialized blob store for large multimodal tensors. It implements a hybrid search engine that integrates vector similarity search, full-text search, and SQL analytics on a single dataset, supported by a storage model that allows high-performance random access to specific records without scanning entire files.
The system covers broad capability areas including ACID data versioning with support for time travel and branching, metadata-driven schema evolution, and distributed data writing. It provides diverse indexing options such as inverted file indexes for vectors, BTree range indexing, and roaring-bitmap scalar indexing to accelerate data retrieval.
The project persists datasets across S3-compatible storage and distributed filesystems using URI schemes.