# vector database

> Search results for `open source vector database` on awesome-repositories.com. 116 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/open-source-vector-database

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/open-source-vector-database).**

## Results

- [open-source-flash/open-source-flash](https://awesome-repositories.com/repository/open-source-flash-open-source-flash.md) (7,320 ⭐) — This project is an open source specification petition platform and proprietary specification archive. It serves as a markdown-based repository for collecting signatures and community support to urge vendors to open source proprietary software specifications.

The platform functions as a tool for open source specification advocacy and proprietary software archival. It creates permanent records of proprietary standards and documents the community efforts required to transition them to open source licenses, ensuring the preservation of technical knowledge.

The system utilizes a git-driven contri
- [n8n-io/self-hosted-ai-starter-kit](https://awesome-repositories.com/repository/n8n-io-self-hosted-ai-starter-kit.md) (14,997 ⭐) — This project provides a dockerized AI workflow stack and orchestration templates for deploying a self-hosted AI environment. It establishes a localized infrastructure for building autonomous agents and model chains that process private data on-premises without external cloud dependencies.

The environment is designed to support autonomous agent development, allowing models to dynamically select tools, execute shell commands, and interact with local file systems. It includes integrated vector database support to enable retrieval augmented generation and private document analysis.

The stack cov
- [chonkie-inc/chonkie](https://awesome-repositories.com/repository/chonkie-inc-chonkie.md) (4,170 ⭐) — Chonkie is a text chunking library designed for retrieval-augmented generation pipelines. It functions as a semantic text splitter and RAG ingestion pipeline, transforming raw text into embedded segments for storage in vector databases.

The project distinguishes itself through specialized splitting strategies, including an AST-based code splitter for preserving logical boundaries in source code and a semantic text splitter that uses embedding models to determine boundaries based on meaning. It also provides a vector database ingestor to automate the generation of embeddings and their export t
- [dkhamsing/open-source-ios-apps](https://awesome-repositories.com/repository/dkhamsing-open-source-ios-apps.md) (50,744 ⭐) — This project is a comprehensive directory of open-source iOS applications designed to serve as a technical reference for developers and learners. It functions as a curated index of mobile software, categorizing projects by their functionality, implementation language, and architectural design to provide a clear view of how professional applications are structured.

The repository distinguishes itself by offering a deep dive into mobile app architecture, allowing users to study real-world codebases that utilize patterns such as Model-View-ViewModel, VIPER, and Clean Architecture. It highlights
- [greenrobot/eventbus](https://awesome-repositories.com/repository/greenrobot-eventbus.md) (24,760 ⭐) — EventBus is a publish-subscribe messaging library designed to facilitate decoupled communication between components in Java applications. It functions as a central hub where producers dispatch events that are routed to subscribers based on the class type of the payload. By using annotation-based markers, the system maps event handlers to specific data types, allowing different parts of an application to exchange information without requiring direct references between classes.

The library distinguishes itself through a focus on performance and execution control. It utilizes a compile-time inde
- [zylon-ai/private-gpt](https://awesome-repositories.com/repository/zylon-ai-private-gpt.md) (57,278 ⭐) — This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to provide context-aware responses for chat and completion requests.

The system distinguishes itself through a database-agnostic abstraction layer that supports various storage backends, ranging from local disk storage to enterprise-grade vector databases. It offers flexible deployment
- [openfilamentcollective/open-filament-database](https://awesome-repositories.com/repository/openfilamentcollective-open-filament-database.md) (43 ⭐) — An open, community-maintained database of 3D-printing filaments — brands, materials, filament product lines, colour variants, spool sizes, and the stores that sell them. Hosted by the Open Filament Collective, currently facilitated by SimplyPrint.
- [microsoft/agent-framework](https://awesome-repositories.com/repository/microsoft-agent-framework.md) (7,277 ⭐) — The agent-framework is an LLM agent orchestration framework and multi-agent workflow engine designed for building autonomous AI agents. It provides a tool integration layer for binding external functions, APIs, and sandboxed code as executable tools for language models.

The framework distinguishes itself through a graph-based system for designing sequential and parallel task flows, featuring state management and checkpointing for long-running processes. It implements comprehensive conversational state management and an observability suite that uses telemetry to trace execution flows and monit
- [redis/redis](https://awesome-repositories.com/repository/redis-redis.md) (74,906 ⭐) — Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms.

What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
- [superlinked/superlinked](https://awesome-repositories.com/repository/superlinked-superlinked.md) (40 ⭐) — Superlinked is a development framework designed for building semantic search and retrieval pipelines. It functions as a machine learning data pipeline and semantic retrieval engine, providing the tools necessary to unify data schema definition, embedding generation, and vector database integration within a single application.

The framework distinguishes itself by acting as a vector database orchestrator that manages the lifecycle of machine learning models alongside complex search logic. It enables developers to construct structured data models that map raw content and metadata into unified r
- [ellerbrock/open-source-badges](https://awesome-repositories.com/repository/ellerbrock-open-source-badges.md) (548 ⭐) — :octocat: Open Source & Licence Badges
- [mongodb/mongo](https://awesome-repositories.com/repository/mongodb-mongo.md) (28,158 ⭐) — This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments.

The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
- [superagent-ai/superagent](https://awesome-repositories.com/repository/superagent-ai-superagent.md) (6,631 ⭐) — Superagent is an AI safety platform that protects applications from prompt injections, data leaks, and harmful outputs through built-in guardrails. It functions as a prompt injection detection system, data redaction tool, and red team testing tool, automatically removing personally identifiable information and protected health data from AI inputs and outputs while scanning image uploads with vision AI to detect visual prompt injection attacks before processing.

The platform routes every prompt through a sequential pipeline of safety checks including injection detection, data redaction, and co
- [tapaswenipathak/open-source-programs](https://awesome-repositories.com/repository/tapaswenipathak-open-source-programs.md) (3,856 ⭐) — A list of open source programs.
- [swift-open-source/ultratabsaver](https://awesome-repositories.com/repository/swift-open-source-ultratabsaver.md) (290 ⭐) — The open source Tab Manager Extension for Safari.
- [crewaiinc/crewai](https://awesome-repositories.com/repository/crewaiinc-crewai.md) (53,687 ⭐) — CrewAI is a multi-agent orchestration framework designed for building autonomous systems that execute complex, multi-step workflows. It provides a development platform where specialized agents are defined with specific roles, goals, and tool sets to perform tasks collaboratively. By leveraging a declarative workflow engine, the system manages task dependencies, state transitions, and execution logic, allowing for the creation of structured, stateful sequences of operations.

The framework distinguishes itself through its hierarchical management capabilities, which utilize manager agents to coo
- [mastra-ai/mastra](https://awesome-repositories.com/repository/mastra-ai-mastra.md) (21,221 ⭐) — Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention.

The framework distinguishes itself through its focus on observability and secure, isolated execut
- [langchain4j/langchain4j](https://awesome-repositories.com/repository/langchain4j-langchain4j.md) (12,346 ⭐) — LangChain4j is a framework and library for building applications powered by large language models on the JVM. It provides a unified API for developing AI agents, implementing retrieval augmented generation, and integrating generative AI capabilities into professional software built with frameworks like Spring Boot or Quarkus.

The project enables the creation of autonomous agents that can reason through tasks, manage memory, and execute external tools to achieve specific goals. It differentiates itself through a unified model interface that allows developers to switch between multiple model pr
- [afonsopacifer/open-source-checklist](https://awesome-repositories.com/repository/afonsopacifer-open-source-checklist.md) (215 ⭐) — :octocat: A guide to help you remember important things when creating an open source project ;D
- [n8n-io/n8n](https://awesome-repositories.com/repository/n8n-io-n8n.md) (192,772 ⭐) — n8n is a workflow automation platform that combines a visual interface with code-based extensibility to design, orchestrate, and manage automated processes. It provides a comprehensive suite of tools for data transformation, filtering, and storage, allowing users to build complex logic through conditional branching, looping, and sub-workflow execution. The platform supports both pre-built integration nodes and custom code execution in JavaScript or Python, enabling connectivity with a wide range of external services and APIs.

The platform includes a suite of generative AI capabilities, such a
- [qdrant/qdrant](https://awesome-repositories.com/repository/qdrant-qdrant.md) (32,372 ⭐) — Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks.

The platform distinguishes itself through advanced retrieval techniques, including support for h
- [stangirard/quiver](https://awesome-repositories.com/repository/stangirard-quiver.md) (39,167 ⭐) — Quiver is a framework for integrating retrieval augmented generation into applications. It provides a generative AI integration layer that connects large language models with vector stores to produce context-aware responses based on custom data.

The project features a knowledge base pipeline that parses diverse file types into searchable embeddings and a vector database orchestrator to manage data across different storage implementations. It utilizes a provider-agnostic model interface, allowing users to switch between various external AI providers or local models through a single unified sys
- [arpit456jain/open-source-programs](https://awesome-repositories.com/repository/arpit456jain-open-source-programs.md) (126 ⭐) — I am planning to list some good and beginner friendly open source programs and their timelines
- [pubkey/rxdb](https://awesome-repositories.com/repository/pubkey-rxdb.md) (23,048 ⭐) — This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored.

The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
- [mlabonne/llm-course](https://awesome-repositories.com/repository/mlabonne-llm-course.md) (80,178 ⭐) — This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as well as the practical implementation of supervised instruction fine-tuning and preference-based model alignment.

The repository distinguishes itself by providing a deep dive into advanced model composition and optimization techniques. It details methodologies for weight-space mode
- [cockroachlabs/open-sourced-interview-process](https://awesome-repositories.com/repository/cockroachlabs-open-sourced-interview-process.md) (425 ⭐) — Open Sourced Interview Process
- [truefoundry/cognita](https://awesome-repositories.com/repository/truefoundry-cognita.md) (4,317 ⭐) — Cognita is a retrieval augmented generation orchestration framework used to build pipelines that connect document stores and language models to provide grounded answers. It functions as a document ingestion pipeline and a vector database integrator, managing the process of loading, parsing, and indexing files into a searchable knowledge base.

The system includes a language model gateway proxy that provides a unified API to interact with multiple different model providers. This routing layer decouples the application from specific vendors, allowing requests to be proxied through a provider-agn
- [pradumnasaraf/open-source-with-pradumna](https://awesome-repositories.com/repository/pradumnasaraf-open-source-with-pradumna.md) (833 ⭐) — Open Source guide - Contains resources and materials to learn and get yourself started with Open Source, Git, and GitHub.
- [wcoder/open-source-xamarin-apps](https://awesome-repositories.com/repository/wcoder-open-source-xamarin-apps.md) (465 ⭐) — A collaborative list of open source Xamarin & MAUI apps.
- [milvus-io/milvus](https://awesome-repositories.com/repository/milvus-io-milvus.md) (44,804 ⭐) — Milvus is a specialized vector database engine designed for the indexing, management, and high-speed similarity retrieval of high-dimensional vector embeddings. It functions as a similarity search engine capable of identifying nearest neighbors within large-scale vector spaces, supporting the storage and retrieval of billions of data points while maintaining consistent performance.

The system utilizes a distributed architecture that decouples storage, query, and coordination into independent services, allowing for horizontal scaling across clusters. It employs a global indexing mechanism that
- [53ai/53aihub](https://awesome-repositories.com/repository/53ai-53aihub.md) (9,025 ⭐) — 53AIHub is a centralized orchestration platform for deploying and managing AI agents and prompts across multiple large language model providers. It functions as a multi-model AI gateway and an operation portal for AI services, providing a unified interface to coordinate agents and prompts from various external platforms.

The project distinguishes itself as a white-label AI portal designed for self-hosted infrastructure, allowing for full control over operational data on private servers or containers. It includes a comprehensive AI SaaS administration layer with a multi-tenant subscription eng
- [zalando/zalando-howto-open-source](https://awesome-repositories.com/repository/zalando-zalando-howto-open-source.md) (805 ⭐) — Open Source guidance from Zalando, Europe's largest online fashion platform
- [superduperdb/superduperdb](https://awesome-repositories.com/repository/superduperdb-superduperdb.md) (5,298 ⭐) — SuperduperDB is an AI agent orchestrator and database-integrated machine learning platform. It serves as a framework for building stateful AI agents and retrieval-augmented generation applications by integrating large language models directly with database backends.

The project enables the deployment of self-hosted AI infrastructure and the management of language models on private hardware using local checkpoints. It distinguishes itself by allowing users to attach AI components directly to data fields, triggering model execution and automated transformations based on database insertions and
- [dragonflydb/dragonfly](https://awesome-repositories.com/repository/dragonflydb-dragonfly.md) (30,688 ⭐) — Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries.

What distinguishes Dragonfly is its focus on effic
- [keygraphhq/shannon](https://awesome-repositories.com/repository/keygraphhq-shannon.md) (44,672 ⭐) — Shannon is an integrated security platform designed for autonomous penetration testing, static and dynamic analysis, and automated vulnerability remediation within self-hosted, private infrastructure. It functions as a unified security suite that orchestrates the entire lifecycle of vulnerability management, from initial discovery and reachability prioritization to the generation and verification of code-level patches.

The platform distinguishes itself through its agentic approach to security, deploying autonomous agents to execute both black-box and white-box exploits against running applica
- [weaviate/verba](https://awesome-repositories.com/repository/weaviate-verba.md) (7,715 ⭐) — Verba is a retrieval-augmented generation interface and chatbot that uses Weaviate to provide factual answers based on private datasets. It functions as a vector database knowledge base, combining a hybrid search engine with an orchestration interface to connect various large language model providers and embedding services.

The system differentiates itself through a RAG pipeline manager for adjusting text chunking rules and retrieval settings, alongside a 3D vector space visualization tool for analyzing the spatial organization and clustering of high-dimensional embeddings. It employs a modul
- [cfpb/open-source-project-template](https://awesome-repositories.com/repository/cfpb-open-source-project-template.md) (214 ⭐) — A project template containing default open source files for new projects
- [open-source-ideas/ideas](https://awesome-repositories.com/repository/open-source-ideas-ideas.md) (6,793 ⭐) — This project is a crowdsourced registry and ideation hub for open source software concepts. It serves as a public database where users submit project requirements and implementation details to attract contributors and recruit collaborators.

The platform distinguishes itself by mapping project ideas to existing software repositories to prevent duplicate development and maintain registry accuracy. It utilizes a categorization engine that allows developers to filter ideas by specific technology stacks and estimated development effort.

The system provides a collaboration layer using threaded dis
- [maiot-io/zenml](https://awesome-repositories.com/repository/maiot-io-zenml.md) (5,452 ⭐) — ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments.

The project distinguishes itself
- [tmc/langchaingo](https://awesome-repositories.com/repository/tmc-langchaingo.md) (9,416 ⭐) — langchaingo is an LLM application framework for Go designed for building language model-powered applications and autonomous agents. It serves as an orchestration library and tool integration framework that allows developers to link prompt sequences and model calls into complex, multi-step workflows.

The project provides a toolkit for implementing retrieval-augmented generation pipelines by processing unstructured documents and retrieving relevant context via vector search. It includes a dedicated integration layer for indexing high-dimensional embeddings and performing similarity searches acr
- [zachflower/awesome-open-source-supporters](https://awesome-repositories.com/repository/zachflower-awesome-open-source-supporters.md) (681 ⭐) — ⭐️ A curated list of companies that offer their services for free to Open Source projects
- [reflex-dev/reflex](https://awesome-repositories.com/repository/reflex-dev-reflex.md) (28,136 ⭐) — Reflex is a full-stack web framework that enables the development of complete web applications using only Python. It provides a unified environment where server-side logic and client-side interfaces are synchronized through a shared, event-driven architecture. By using a declarative component language, the framework compiles code into reactive frontend elements and backend event handlers, allowing developers to manage the entire application lifecycle within a single codebase.

The framework distinguishes itself through its reactive state management and integrated AI-assisted development tools.
- [activeloopai/deeplake](https://awesome-repositories.com/repository/activeloopai-deeplake.md) (9,175 ⭐) — DeepLake is AI data infrastructure consisting of a multimodal data lake, a hybrid search engine, and a serverless vector database. It provides a PostgreSQL-based AI data runtime that combines multimodal storage with streaming pipelines to load and shuffle datasets from cloud storage directly into deep learning training pipelines.

The system utilizes lazy indexing to store and slice images, audio, and video without loading entire files into memory. It enables retrieval-augmented generation by persisting high-dimensional embeddings in a serverless vector store and implementing hybrid search tha
- [open-source-legal/opencontracts](https://awesome-repositories.com/repository/open-source-legal-opencontracts.md) (1,356 ⭐) — The open document intelligence platform for builders and hackers - DMS for the agentic world
- [mongodb-developer/genai-showcase](https://awesome-repositories.com/repository/mongodb-developer-genai-showcase.md) (4,236 ⭐) — This project is a collection of generative AI implementations focused on the development of AI agents, retrieval-augmented generation pipelines, and vector search integration. It provides a framework for connecting managed cloud databases to language models to create context-aware applications.

The project covers the orchestration of autonomous agents that use multi-step reasoning and external tools to complete tasks. It includes implementations for semantic retrieval using high-dimensional embeddings and the use of model-agnostic prompting to ensure consistent outputs across different large
- [formbricks/formbricks](https://awesome-repositories.com/repository/formbricks-formbricks.md) (12,391 ⭐) — Formbricks is an open-source survey and feedback platform designed to help teams capture and analyze user insights through targeted, in-app, and website-based interactions. It functions as a comprehensive customer experience analytics system that allows organizations to maintain full control over their data, user attributes, and survey workflows.

The platform distinguishes itself through its event-driven architecture, which enables precise behavioral targeting by triggering surveys based on specific user actions or application events. It supports deep integration with external ecosystems by a
- [vectorize-io/vectorize-mcp-server](https://awesome-repositories.com/repository/vectorize-io-vectorize-mcp-server.md) (108 ⭐) — Official Vectorize MCP Server
- [danthareja/contribute-to-open-source](https://awesome-repositories.com/repository/danthareja-contribute-to-open-source.md) (1,495 ⭐) — The goal of this project is to empower you to contribute code to open source projects on GitHub by teaching you the mechanics of the process in an interactive experience.
- [helicone/helicone](https://awesome-repositories.com/repository/helicone-helicone.md) (5,830 ⭐) — Helicone is an AI gateway and observability platform designed to intercept, manage, and monitor interactions with large language models. By acting as a reverse-proxy, it provides a centralized layer for routing requests across multiple AI providers, allowing developers to maintain consistent application logic while gaining deep visibility into model performance, usage, and costs.

The platform distinguishes itself through a robust suite of traffic management and prompt engineering tools. It enables policy-driven control, including automatic failover between providers, rate limiting, and edge-b
- [dlt-hub/dlt](https://awesome-repositories.com/repository/dlt-hub-dlt.md) (5,472 ⭐) — dlt is a Python data ingestion tool and ETL pipeline framework designed to fetch data from diverse sources and persist it into structured destinations. It functions as a schema inference engine that automatically detects data types and flattens nested JSON structures into relational tables, moving data from sources to lakehouses, warehouses, or vector databases.

The project distinguishes itself through AI-powered pipeline generation, using large language models to scaffold extraction code and connectors for REST APIs. It also supports multimodal vector storage and specialized population of ve