1 repo
Modular systems for generating vector representations from data using external machine learning models.
Distinguishing note: Focuses on the decoupling of embedding generation from storage.
Explore 1 awesome GitHub repository matching data & databases · Embedding Pipelines. Refine with filters or upvote what's useful.
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Decouples the vector generation process from the storage layer to support diverse third-party machine learning models.