9 repos

Awesome GitHub RepositoriesVector Databases

Storage engines and infrastructure designed to index, store, and retrieve high-dimensional embeddings for semantic search.

Explore 9 awesome GitHub repositories matching data & databases · Vector Databases. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Generates vector embeddings on-device to facilitate semantic search and document retrieval.
C++ai-chatllm-inference
mlabonne/llm-course
mlabonne/llm-course
75,340GitHubView on GitHub
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as we
Provides practical patterns for building vector storage solutions essential for effective retrieval-augmented generation pipelines.
courselarge-language-modelsllm
redis/redis
redis/redis
73,096GitHubView on GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr
Indexes high-dimensional embeddings to facilitate efficient semantic search and machine learning workflows.
Ccachecachingdatabase
twitter/the-algorithm
twitter/the-algorithm
72,764GitHubView on GitHub
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Calculates geometric proximity between user and item representations in high-dimensional vector space to identify relevant content.
Scala
pathwaycom/pathway
pathwaycom/pathway
59,684GitHubView on GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with
Integrates external vector database clients directly into data ingestion workflows to automate real-time document indexing.
Pythonbatch-processingdata-analyticsdata-pipelines
zylon-ai/private-gpt
zylon-ai/private-gpt
57,116GitHubView on GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
Connects applications to external vector stores by configuring host, port, and authentication details.
Python
pathwaycom/llm-app
pathwaycom/llm-app
56,311GitHubView on GitHub
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transfo
Supports low-latency retrieval of evolving knowledge bases for retrieval-augmented generation applications.
Jupyter Notebookchatbothugging-facellm
appwrite/appwrite
appwrite/appwrite
54,884GitHubView on GitHub
Appwrite is a backend-as-a-service platform that provides a unified development environment for building full-stack applications. It integrates essential infrastructure components—including authentication, databases, storage, and serverless functions—into a single, centralized interface to simplify application developm
Integrates with external vector stores to enable similarity searching and efficient retrieval of unstructured data.
TypeScriptandroidappwritebackend
Mintplex-Labs/anything-llm
Mintplex-Labs/anything-llm
54,751GitHubView on GitHub
This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that a
Utilizes local vector indices to perform semantic similarity searches for context-aware language model generation.
JavaScriptai-agentscustom-ai-agentsdeepseek

Explore sub-tags

9 repos

Awesome GitHub RepositoriesVector Databases

Storage engines and infrastructure designed to index, store, and retrieve high-dimensional embeddings for semantic search.

Explore 9 awesome GitHub repositories matching data & databases · Vector Databases. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Generates vector embeddings on-device to facilitate semantic search and document retrieval.
C++ai-chatllm-inference
mlabonne/llm-course
mlabonne/llm-course
75,340GitHubView on GitHub
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as we
Provides practical patterns for building vector storage solutions essential for effective retrieval-augmented generation pipelines.
courselarge-language-modelsllm
redis/redis
redis/redis
73,096GitHubView on GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr
Indexes high-dimensional embeddings to facilitate efficient semantic search and machine learning workflows.
Ccachecachingdatabase
twitter/the-algorithm
twitter/the-algorithm
72,764GitHubView on GitHub
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Calculates geometric proximity between user and item representations in high-dimensional vector space to identify relevant content.
Scala
pathwaycom/pathway
pathwaycom/pathway
59,684GitHubView on GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with
Integrates external vector database clients directly into data ingestion workflows to automate real-time document indexing.
Pythonbatch-processingdata-analyticsdata-pipelines
zylon-ai/private-gpt
zylon-ai/private-gpt
57,116GitHubView on GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
Connects applications to external vector stores by configuring host, port, and authentication details.
Python
pathwaycom/llm-app
pathwaycom/llm-app
56,311GitHubView on GitHub
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transfo
Supports low-latency retrieval of evolving knowledge bases for retrieval-augmented generation applications.
Jupyter Notebookchatbothugging-facellm
appwrite/appwrite
appwrite/appwrite
54,884GitHubView on GitHub
Appwrite is a backend-as-a-service platform that provides a unified development environment for building full-stack applications. It integrates essential infrastructure components—including authentication, databases, storage, and serverless functions—into a single, centralized interface to simplify application developm
Integrates with external vector stores to enable similarity searching and efficient retrieval of unstructured data.
TypeScriptandroidappwritebackend
Mintplex-Labs/anything-llm
Mintplex-Labs/anything-llm
54,751GitHubView on GitHub
This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that a
Utilizes local vector indices to perform semantic similarity searches for context-aware language model generation.
JavaScriptai-agentscustom-ai-agentsdeepseek

Awesome Vector Databases GitHub Repositories

nomic-ai/gpt4all

mlabonne/llm-course

redis/redis

twitter/the-algorithm

pathwaycom/pathway

zylon-ai/private-gpt

pathwaycom/llm-app

appwrite/appwrite

Mintplex-Labs/anything-llm

Explore sub-tags

Awesome Vector Databases GitHub Repositories

nomic-ai/gpt4all

mlabonne/llm-course

redis/redis

twitter/the-algorithm

pathwaycom/pathway

zylon-ai/private-gpt

pathwaycom/llm-app

appwrite/appwrite

Mintplex-Labs/anything-llm

Explore sub-tags