This project is an extension for PostgreSQL that enables the storage, indexing, and querying of high-dimensional vector embeddings directly within relational tables. It functions as a vector similarity search engine, allowing users to perform nearest neighbor searches using standard distance metrics such as cosine, inner product, and L2 distance. By integrating these capabilities into the database engine, it allows for the execution of vector operations alongside traditional relational data management.
The extension distinguishes itself by enabling hybrid search workflows, where vector similarity results are combined with relational filters or full-text search criteria within a single query plan. It utilizes specialized indexing structures, including graph-based and cluster-based algorithms, to provide logarithmic search performance on large datasets. These indexes are managed through standard database operators, allowing for the integration of vector-based machine learning workflows into existing SQL syntax.
Beyond core search functionality, the project provides a suite of tools for managing high-dimensional data, including vector aggregation, mathematical transformations, and format conversion. It supports memory-optimized storage formats to reduce the footprint of embeddings and executes distance calculations directly within the database memory space to minimize latency. The extension is designed to be installed as a standard PostgreSQL module, providing native support for vector data types and query optimization.