3 dépôts
Compression methods that decompose high-dimensional vectors into compact codebook representations.
Distinguishing note: Focuses on memory footprint reduction via vector decomposition.
Explore 3 awesome GitHub repositories matching data & databases · Product Quantization. Refine with filters or upvote what's useful.
This project is a high-performance library designed for the similarity search and clustering of dense vectors across massive datasets. It functions as a vector similarity search engine, providing the necessary tools to organize complex numerical data into specialized structures that facilitate rapid retrieval and efficient querying of millions of records. The library distinguishes itself through a variety of advanced indexing and compression techniques, including hierarchical navigable small worlds for logarithmic time complexity and inverted file indexing to partition vector spaces into mana
Reduces memory footprint by decomposing high-dimensional vectors into smaller sub-vectors represented by compact codebook indices.
fastText is a library and framework for word embedding generation, text vectorization, and supervised text classification. It provides tools to transform raw text into fixed-length vector representations and to train models that assign category labels to sentences or documents. The system utilizes subword-based vectorization and character n-gram embeddings, allowing it to generate meaningful vectors for words that were not present during training. To manage resource usage, it includes a quantized language model implementation that employs product quantization and dimensionality reduction to d
Implements product quantization to compress large vector matrices into smaller codebooks.
Weaviate is a cloud-native vector database and distributed vector store designed to save high-dimensional vectors alongside structured data. It functions as a hybrid search engine that combines vector similarity, keyword matching, and structured metadata filtering within a single query. The system is optimized for retrieval-augmented generation, integrating vector search with generative AI and reranking to power question-and-answer workflows. It distinguishes itself through the ability to merge semantic search with traditional keyword queries and structured metadata filters to improve result
Reduces memory usage of high-dimensional vectors through product quantization and centroid-based approximation.