3 dépôts
Capabilities for modifying the structure of existing tables, such as adding or removing columns, without rewriting the entire dataset.
Distinct from External Column Merges: The feature covers both SQL-based derived columns and external merges, whereas [f0_mt1] only covers external merges.
Explore 3 awesome GitHub repositories matching data & databases · Schema Evolutions. Refine with filters or upvote what's useful.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB adds new data fields to an existing table via SQL expressions or external table merges.
Noms is a distributed version control database and content-addressable data store. It identifies data by cryptographic hashes to ensure integrity and deduplication, while tracking dataset state changes through a sequence of immutable commits to enable branching, forking, and historical recovery. The system functions as a peer-to-peer data synchronizer, reconciling state between disconnected database instances to ensure all nodes converge on the same data. It distinguishes itself as a schema-flexible document store that supports self-describing types, allowing schemas to evolve and widen as ne
Supports schema-less type evolution, allowing container types to widen implicitly as new data is added.
Lance is a versioned columnar data format and storage engine designed as a multimodal AI lakehouse. It serves as a vector database storage engine and a cloud object store dataset manager, organizing images, video, audio, and embeddings into a unified format optimized for machine learning workflows. The project distinguishes itself by combining a columnar layout for structured data with a specialized blob store for large multimodal tensors. It implements a hybrid search engine that integrates vector similarity search, full-text search, and SQL analytics on a single dataset, supported by a stor
Adds new columns with backfilled values to existing tables without requiring a full dataset rewrite.