# eto-ai/lance

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/eto-ai-lance).**

6,671 stars · 719 forks · Rust · Apache-2.0

## Links

- GitHub: https://github.com/eto-ai/lance
- Homepage: https://lance.org
- awesome-repositories: https://awesome-repositories.com/repository/eto-ai-lance.md

## Description

Lance is a versioned columnar data format and storage engine designed as a multimodal AI lakehouse. It serves as a vector database storage engine and a cloud object store dataset manager, organizing images, video, audio, and embeddings into a unified format optimized for machine learning workflows.

The project distinguishes itself by combining a columnar layout for structured data with a specialized blob store for large multimodal tensors. It implements a hybrid search engine that integrates vector similarity search, full-text search, and SQL analytics on a single dataset, supported by a storage model that allows high-performance random access to specific records without scanning entire files.

The system covers broad capability areas including ACID data versioning with support for time travel and branching, metadata-driven schema evolution, and distributed data writing. It provides diverse indexing options such as inverted file indexes for vectors, BTree range indexing, and roaring-bitmap scalar indexing to accelerate data retrieval.

The project persists datasets across S3-compatible storage and distributed filesystems using URI schemes.

## Tags

### Artificial Intelligence & ML

- [Hybrid Search Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-similarity-search/hybrid-search-methods.md) — Integrates vector similarity search, full-text search, and SQL relational filters into a single hybrid retrieval system. ([source](https://cdn.jsdelivr.net/gh/eto-ai/lance@main/README.md))
- [Image Tensor Storage](https://awesome-repositories.com/f/artificial-intelligence-ml/image-tensor-storage.md) — Manages images as multi-dimensional numeric arrays for direct use in machine learning models and tensor operations. ([source](https://lance.org/guide/arrays/))
- [Streaming Dataset Loaders](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-batch-loading/streaming-dataset-loaders.md) — Reads large datasets in incremental batches to process data that exceeds available system memory. ([source](https://lance.org/guide/read_and_write/))
- [Random-Access Dataset Streaming](https://awesome-repositories.com/f/artificial-intelligence-ml/random-access-dataset-streaming.md) — Implements storage-level random access to rows by index to optimize sampling and shuffling for machine learning training. ([source](https://lance.org/guide/read_and_write/))

### Data & Databases

- [Lakehouse Table Formats](https://awesome-repositories.com/f/data-databases/lakehouse-table-formats.md) — Implements a storage format for large-scale AI datasets that brings ACID transactions and high-performance random access to multimodal data.
- [Multimodal Data Storage](https://awesome-repositories.com/f/data-databases/multimodal-data-storage.md) — Organizes large-scale multimodal datasets into a file and table format optimized for high-performance vector search. ([source](https://lance.org/format))
- [Columnar-Blob Hybrid Storage](https://awesome-repositories.com/f/data-databases/columnar-blob-hybrid-storage.md) — Combines a columnar layout for structured data with a specialized blob store for large multimodal tensors.
- [Columnar Storage Engines](https://awesome-repositories.com/f/data-databases/columnar-storage-engines.md) — Provides a storage layout organized by column with ACID transactions for efficient analytical workloads and versioning.
- [Dataset Creation](https://awesome-repositories.com/f/data-databases/data-collections-datasets/dataset-creation.md) — Writes data from tables, dataframes, or streams into a high-performance columnar format. ([source](https://lance.org/guide/read_and_write/))
- [S3-Compatible Cloud Storage](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage/file-based-storage/local-file-storage/s3-compatible-cloud-storage.md) — Integrates with S3-compatible object storage providers for persisting large-scale datasets and multimodal assets. ([source](https://lance.org/guide/object_store/))
- [Schema Evolution](https://awesome-repositories.com/f/data-databases/data-type-schemas/schema-evolution.md) — Updates table structures by modifying manifest metadata to add or remove columns without rewriting data files.
- [ACID Dataset Versioning](https://awesome-repositories.com/f/data-databases/federated-data-query-engines/lakehouse-engines/acid-dataset-versioning.md) — Tracks dataset changes via ACID transactions, supporting time travel, branching, and tagging for reproducible AI experiments.
- [Blob Dataset Management](https://awesome-repositories.com/f/data-databases/file-storage-systems/cloud-object-storage/blob-dataset-management.md) — Provides a management layer for S3-compatible storage that handles large binary blobs and tensors with lazy loading.
- [Hybrid Search Engines](https://awesome-repositories.com/f/data-databases/hybrid-search-engines.md) — Combines vector-based semantic retrieval, traditional keyword indexing, and SQL analytics in one system.
- [Lazy-Loading Blob Storage](https://awesome-repositories.com/f/data-databases/lazy-loading-blob-storage.md) — Stores multimedia content using a specialized type that supports lazy loading to prevent memory overload. ([source](https://lance.org/guide/blob/))
- [Manifest-Based Versioning](https://awesome-repositories.com/f/data-databases/manifest-based-versioning.md) — Tracks dataset state via immutable manifests to enable ACID transactions, time travel, and dataset branching.
- [Object-Storage Persistence](https://awesome-repositories.com/f/data-databases/object-storage-persistence.md) — Persists datasets across cloud providers and distributed filesystems using URI schemes to specify storage backends. ([source](https://lance.org/guide/object_store/))
- [Random Access Data Retrieval](https://awesome-repositories.com/f/data-databases/random-access-data-retrieval.md) — Provides high-speed retrieval of specific records from massive columnar datasets without requiring sequential scans. ([source](https://cdn.jsdelivr.net/gh/eto-ai/lance@main/README.md))
- [Schema Evolutions](https://awesome-repositories.com/f/data-databases/schema-evolutions.md) — Adds new columns with backfilled values to existing tables without requiring a full dataset rewrite. ([source](https://cdn.jsdelivr.net/gh/eto-ai/lance@main/README.md))
- [ACID-Compliant](https://awesome-repositories.com/f/data-databases/storage-engines/acid-compliant.md) — Tracks changes via automatic versioning and ACID transactions to support time travel, tagging, and branching. ([source](https://cdn.jsdelivr.net/gh/eto-ai/lance@main/README.md))
- [Vector Databases](https://awesome-repositories.com/f/data-databases/vector-databases.md) — Indexes and stores high-dimensional embeddings to enable fast similarity search across large datasets.
- [Inverted File Indexes](https://awesome-repositories.com/f/data-databases/vector-indexing/inverted-file-indexes.md) — Uses inverted file indexes with product quantization to enable fast approximate nearest neighbor search on embeddings.
- [Vector Similarity Search](https://awesome-repositories.com/f/data-databases/vector-similarity-search.md) — Implements high-performance nearest neighbor search for vector embeddings using IVF-PQ indexing. ([source](https://lance.org/guide/data_types/))
- [Vector Storage](https://awesome-repositories.com/f/data-databases/vector-storage.md) — Provides optimized storage for high-dimensional numerical embeddings to facilitate SIMD-accelerated distance computations. ([source](https://lance.org/guide/data_types/))
- [Column Projection](https://awesome-repositories.com/f/data-databases/wide-column-stores/column-oriented-disk-storage/column-projection.md) — Retrieves specific columns and rows using projection and SQL predicate push-down to minimize I/O. ([source](https://lance.org/guide/read_and_write/))
- [JSON Array Functions](https://awesome-repositories.com/f/data-databases/array-column-operations/json-array-functions.md) — Extracts values, checks existence, or measures array lengths within JSON columns using JSONPath syntax. ([source](https://lance.org/guide/json/))
- [Binary Blob Retrieval](https://awesome-repositories.com/f/data-databases/binary-blob-retrieval.md) — Fetches binary objects using positional indices, logical row IDs, or physical addresses to locate specific data. ([source](https://lance.org/guide/blob/))
- [Column Renamers](https://awesome-repositories.com/f/data-databases/column-transformation/column-renamers.md) — Implements utilities for changing the names of top-level or nested columns within a dataset. ([source](https://lance.org/guide/data_evolution/))
- [Nested Columnar Storage](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage/nested-columnar-storage.md) — Saves nested records and dynamic key-value maps within single columns of the columnar format. ([source](https://lance.org/guide/data_types/))
- [Data File Compaction](https://awesome-repositories.com/f/data-databases/data-file-compaction.md) — Rewrites small data fragments into larger files and removes deleted rows to improve storage layout and scan performance. ([source](https://lance.org/guide/read_and_write/))
- [Table Metadata Updates](https://awesome-repositories.com/f/data-databases/database-management-systems/database-systems-management/database-management/schema-designers/table-schemas/table-metadata-inspection/table-metadata-updates.md) — Updates table structures and backfills data via metadata modifications to avoid costly full-dataset rewrites. ([source](https://lance.org/format))
- [Data Value Updates](https://awesome-repositories.com/f/data-databases/dataset-joins/dataset-configuration-updating/data-value-updates.md) — Modifies existing column values using a left-outer-hash-join between dataset fragments and update data. ([source](https://lance.org/guide/distributed_write/))
- [In-Memory Caching](https://awesome-repositories.com/f/data-databases/dataset-management-tools/in-memory-caching.md) — Maintains vector and scalar indices in a memory cache to improve query response times and minimize disk I/O. ([source](https://lance.org/guide/performance/))
- [Roaring Bitmaps](https://awesome-repositories.com/f/data-databases/field-scoped-indexing/inverted-scalar-indexes/roaring-bitmaps.md) — Implements compressed inverted lookup tables using roaring bitmaps to accelerate filtering on low-cardinality columns.
- [Sorted Indexing](https://awesome-repositories.com/f/data-databases/hierarchical-index-sorting/sorted-indexing.md) — Provides BTree-based sorted indexing to facilitate efficient range queries and ordered data retrieval. ([source](https://lance.org/guide/performance/))
- [Incremental Data Appending](https://awesome-repositories.com/f/data-databases/incremental-data-appending.md) — Inserts new records into existing tables to grow datasets via bulk writes or append modes. ([source](https://lance.org/guide/read_and_write/))
- [Index Construction](https://awesome-repositories.com/f/data-databases/index-construction.md) — Constructs specialized indices to enable high-performance similarity search across large-scale multimodal datasets. ([source](https://lance.org/guide/performance/))
- [Parallel Construction](https://awesome-repositories.com/f/data-databases/index-construction/parallel-construction.md) — Coordinates multiple workers to build index segments in parallel, optimizing the construction process across CPU cores. ([source](https://lance.org/guide/distributed_indexing/))
- [Hybrid Relational-JSON Storage](https://awesome-repositories.com/f/data-databases/json-document-stores/hybrid-relational-json-storage.md) — Encodes JSON as binary JSONB for efficient storage and fast query performance for nested fields. ([source](https://lance.org/guide/json/))
- [Full-Text JSON Indexes](https://awesome-repositories.com/f/data-databases/json-document-stores/json-search-indices/full-text-json-indexes.md) — Builds inverted indices over JSON documents to enable comprehensive text-based search across all contained content. ([source](https://lance.org/guide/json/))
- [Metadata Caching](https://awesome-repositories.com/f/data-databases/metadata-caching.md) — Caches manifests, transactions, and file metadata in memory to accelerate data access and reduce repeated lookups. ([source](https://lance.org/guide/performance/))
- [Optimistic Concurrency Control](https://awesome-repositories.com/f/data-databases/optimistic-concurrency-control.md) — Manages simultaneous writes by validating manifest versions before commit to ensure data consistency without locking.
- [Parallel Storage Writing](https://awesome-repositories.com/f/data-databases/parallel-storage-writing.md) — Generates data fragments in parallel across multiple nodes and commits them into a single dataset. ([source](https://lance.org/guide/distributed_write/))
- [Row Deletions](https://awesome-repositories.com/f/data-databases/row-deletions.md) — Removes specific records from a dataset using SQL expressions to mark rows as deleted. ([source](https://lance.org/guide/read_and_write/))
- [Search Index Management](https://awesome-repositories.com/f/data-databases/search-index-management.md) — Treats search indices as versioned table objects decoupled from file encoding to support independent evolution. ([source](https://lance.org/format))
- [SQL-Based Row Updates](https://awesome-repositories.com/f/data-databases/sql-based-row-updates.md) — Modifies existing record values using SQL expressions to apply changes to specific rows. ([source](https://lance.org/guide/read_and_write/))
- [Binary Data Streaming](https://awesome-repositories.com/f/data-databases/storage-abstraction/file-storage-services/binary-data-streaming.md) — Retrieves large objects as lazy handles to stream bytes on demand without loading the entire object. ([source](https://lance.org/guide/blob/))
- [Column Deletions](https://awesome-repositories.com/f/data-databases/table-definitions/table-deletion/column-deletions.md) — Provides the ability to remove individual columns from a table schema via metadata updates. ([source](https://lance.org/guide/data_evolution/))
- [Bitmap Indexes](https://awesome-repositories.com/f/data-databases/table-indexing-systems/database-indexes/index-accelerated-querying/column-indexing/bitmap-indexes.md) — Uses Roaring Bitmaps to generate compressed inverted lookup tables for high-performance filtering on low-cardinality data. ([source](https://lance.org/guide/performance/))
- [Upsert Operations](https://awesome-repositories.com/f/data-databases/upsert-operations.md) — Adds new data in bulk while matching against existing records to handle updates and insertions in one operation. ([source](https://lance.org/guide/read_and_write/))

### Development Tools & Productivity

- [Dataset Version Tagging](https://awesome-repositories.com/f/development-tools-productivity/version-tag-management/dataset-version-tagging.md) — Assigns named tags to specific dataset versions to track evolution and protect important snapshots. ([source](https://lance.org/guide/tags_and_branches/))
- [Dataset Branching](https://awesome-repositories.com/f/development-tools-productivity/version-tag-management/dataset-version-tagging/dataset-branching.md) — Creates independent branches from specific versions or tags to allow separate lines of development and writes. ([source](https://lance.org/guide/tags_and_branches/))

### Scientific & Mathematical Computing

- [BTree Range Indexes](https://awesome-repositories.com/f/scientific-mathematical-computing/range-query-structures/btree-range-indexes.md) — Implements a two-level sorted BTree structure to enable efficient range queries and ordered data retrieval.

### Part of an Awesome List

- [Data Management](https://awesome-repositories.com/f/awesome-lists/data/data-management.md) — Implements a modern columnar data format for ML.
