awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Vector Databases · Awesome GitHub Repositories

9 repos

Awesome GitHub RepositoriesVector Databases

Storage engines and infrastructure designed to index, store, and retrieve high-dimensional embeddings for semantic search.

Explore 9 awesome GitHub repositories matching data & databases · Vector Databases. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Database Management Systems
  4. Database Engines
  5. Vector Databases

Awesome Vector Databases GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • nomic-ai/gpt4all

    nomic-ai/gpt4all

    77,146GitHubView on GitHub↗

    GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh

    Generates vector embeddings on-device to facilitate semantic search and document retrieval.

    C++ai-chatllm-inference
  • mlabonne/llm-course

    mlabonne/llm-course

    75,340GitHubView on GitHub↗

    This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as we

    Provides practical patterns for building vector storage solutions essential for effective retrieval-augmented generation pipelines.

    courselarge-language-modelsllm
  • redis/redis

    redis/redis

    73,096GitHubView on GitHub↗

    Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr

    Indexes high-dimensional embeddings to facilitate efficient semantic search and machine learning workflows.

    Ccachecachingdatabase
  • twitter/the-algorithm

    twitter/the-algorithm

    72,764GitHubView on GitHub↗

    The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver

    Calculates geometric proximity between user and item representations in high-dimensional vector space to identify relevant content.

    Scala
  • pathwaycom/pathway

    pathwaycom/pathway

    59,684GitHubView on GitHub↗

    Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with

    Integrates external vector database clients directly into data ingestion workflows to automate real-time document indexing.

    Pythonbatch-processingdata-analyticsdata-pipelines
  • zylon-ai/private-gpt

    zylon-ai/private-gpt

    57,116GitHubView on GitHub↗

    This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov

    Connects applications to external vector stores by configuring host, port, and authentication details.

    Python
  • pathwaycom/llm-app

    pathwaycom/llm-app

    56,311GitHubView on GitHub↗

    This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transfo

    Supports low-latency retrieval of evolving knowledge bases for retrieval-augmented generation applications.

    Jupyter Notebookchatbothugging-facellm
  • appwrite/appwrite

    appwrite/appwrite

    54,884GitHubView on GitHub↗

    Appwrite is a backend-as-a-service platform that provides a unified development environment for building full-stack applications. It integrates essential infrastructure components—including authentication, databases, storage, and serverless functions—into a single, centralized interface to simplify application developm

    Integrates with external vector stores to enable similarity searching and efficient retrieval of unstructured data.

    TypeScriptandroidappwritebackend
  • Mintplex-Labs/anything-llm

    Mintplex-Labs/anything-llm

    54,751GitHubView on GitHub↗

    This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that a

    Utilizes local vector indices to perform semantic similarity searches for context-aware language model generation.

    JavaScriptai-agentscustom-ai-agentsdeepseek

Explore sub-tags

  • Chroma IntegrationsSupport for local disk-based vector storage using Chroma.
  • Local Embedding ProvidersServices that generate vector embeddings from text locally on the host machine.
  • Milvus IntegrationsConfiguration and connectivity for Milvus vector stores.
  • PostgreSQL Vector StoresConfigurations for using PostgreSQL with vector extensions as a knowledge base.
Similarity Search EnginesMechanisms for retrieving data based on geometric proximity in vector space.
  • Vector Database IntegrationsTools and configurations for connecting applications to vector stores to enable similarity search and data retrieval.
  • Vector Document IndexingAutomated workflows for indexing documents into vector databases to support real-time search and retrieval.
  • Vector Search FrameworksSpecialized tools for low-latency retrieval of vector data in AI and RAG applications.
  • Vector Storage ImplementationsEducational resources or code patterns for building custom vector storage engines.
  • Vector-Database-Backed RetrievalsSystems that use vector indices to perform semantic similarity searches for context retrieval.