59 repository-uri
Tools for transforming raw data into structured, searchable knowledge graphs.
Distinguishing note: Focuses on the indexing phase of knowledge graph creation.
Explore 59 awesome GitHub repositories matching data & databases · Knowledge Graph Indexers. Refine with filters or upvote what's useful.
Codegraph is a local codebase indexer and static analysis graph database that serves as a context provider for AI agents. It parses multiple programming languages into a searchable knowledge graph of symbols and dependencies, exposing these relationships to AI tools through the Model Context Protocol. The project distinguishes itself by aggregating relevant code snippets and symbol flows to reduce token usage for large language models. It automates the configuration of server settings and steering instructions across various AI agent platforms and command line editors to enable automatic code
Parses source files into a searchable knowledge graph of symbols and edges using incremental updates.
GraphRAG is a data processing pipeline and retrieval engine designed to transform unstructured text into interconnected knowledge graphs. By utilizing language models to extract entities and relationships, it builds structured representations of information that enable context-aware retrieval for downstream applications. The system distinguishes itself through hierarchical graph clustering and large-scale data synthesis, which organize massive document corpora into multi-level structures. This approach allows for both vector-based semantic searches and graph-based traversals, providing a comp
Transforms raw data into structured knowledge graphs to create a searchable and interconnected format.
gbrain is an agent framework and retrieval-augmented generation system that combines a durable task queue, a git-synced vector store, and a knowledge graph engine. It provides a foundation for building AI agents that interact with structured knowledge bases using the Model Context Protocol. The system synchronizes markdown files from a git repository into a database for high-performance semantic retrieval and creates typed edges between data pages by extracting entity references and wikilinks. It uses a database-backed queue to execute persistent background jobs and tool loops, ensuring relia
Automatically creates typed edges between pages by extracting entity references from markdown and wikilinks.
This project is a recommendation system framework designed for building, evaluating, and operationalizing personalized item suggestion engines. It provides a comprehensive toolkit for implementing collaborative filtering and content-based algorithms, supported by an end-to-end machine learning pipeline for preparing datasets and deploying predictive models. The framework distinguishes itself through the integration of knowledge graphs to provide richer context for recommendations and the use of industry-specific patterns to accelerate system deployment. It also includes a specialized model ev
Generates recommendations by leveraging structured knowledge graph data to explore entity relationships.
Sourcetrail is an interactive source code explorer and visualizer designed for indexing and navigating relationships between symbols and structures across large, multi-language codebases. It functions as a static analysis indexer and code dependency visualizer that maps calls and dependencies between source files to help reveal project architecture. The tool enables multi-language project analysis by using a language-agnostic indexing system to track symbols across different programming languages within a single interface. It allows for the discovery of software architecture and the explorati
Uses a language-agnostic system to track symbols and relationships across different programming languages.
Memori is an AI agent memory middleware platform designed to provide persistent, context-aware recall for language models. It functions as a non-intrusive layer that intercepts outbound model requests to automatically capture interaction history and execution traces, ensuring that agents maintain continuity across sessions without requiring modifications to existing application logic. The platform distinguishes itself through a dual-model storage architecture that maintains information as both structured relational primitives for precise fact retrieval and rolling narrative summaries for situ
Extracts structured facts and semantic relationships from interaction data to build a queryable knowledge base for AI models.
Hermes-webui is a self-hosted AI orchestrator and web interface for managing autonomous agents. It serves as a multi-provider gateway that connects cloud and local large language models, providing a central hub to execute scheduled background jobs, run shell commands, and manage agent memory on private hardware. The system distinguishes itself through a persistent memory manager that utilizes knowledge graphs and markdown files for long-term context across sessions. It features a model context protocol host for extending agent capabilities with standardized tools and supports the orchestratio
Builds a structured knowledge graph of facts that accumulates across different projects and sessions for long-term recall.
OpenMetadata is an enterprise data catalog, metadata platform, and governance suite that functions as a knowledge graph for data assets. It serves as an AI-ready metadata layer, providing governed context and organizational memory to large language model agents via the Model Context Protocol. The platform distinguishes itself by capturing institutional knowledge, linking conversations, decisions, and remediation notes directly to data assets to preserve tribal knowledge. It integrates AI agents to automate metadata governance, such as suggesting descriptions and identifying sensitive data thr
Constructs a knowledge graph connecting technical assets, people, and business concepts.
The Language Server Protocol is a vendor-neutral communication framework that provides a standardized interface for code intelligence. It decouples language-specific analysis from the editor interface, allowing development tools to exchange structured data with external language servers to power features such as autocomplete, diagnostics, and symbol navigation. By utilizing a universal protocol schema, the framework enables cross-editor plugin development and ensures interoperability across different programming environments. It employs a capability negotiation handshake to establish a shared
Indexes code symbols and references into structured formats to facilitate efficient analysis and navigation.
Nebula is a distributed graph database designed for storing and querying massive volumes of interconnected vertices and edges across a horizontally scalable cluster. It functions as a Kubernetes-native database and a distributed graph analytics engine, utilizing a Raft-based distributed store to ensure strong consistency and high availability. The system features an OpenCypher query engine for performing complex graph traversals and pattern matching. It distinguishes itself with a decoupled compute-storage architecture and a shared-nothing distributed design, allowing query processing and dat
Supports creating exact-match and range indexes on vertex and edge properties to accelerate graph lookups and filtering.
Blinko is a personal knowledge management system and an LLM-powered knowledge base that enables users to capture and organize thoughts through a bi-directional knowledge graph. It functions as a RAG-enabled note-taking application and a self-hosted Markdown editor, allowing for the creation of permanent documentation and fleeting notes. The project distinguishes itself by integrating retrieval-augmented generation to provide conversational querying and AI-powered analysis of private document libraries. It supports both cloud-based and local AI model integration, enabling users to perform sema
Connects ideas through reciprocal links to build a structured knowledge graph.
This project is a structured catalog of server-side development questions and advanced Node.js concepts designed for senior-level interview preparation. It focuses on backend engineering topics including architecture, performance, and system design, while also covering Node.js internals, async patterns, and production debugging. The resource organizes interview topics into a navigable knowledge graph of interconnected concepts and subtopics, with explicit cross-references linking related ideas together. Content is presented through a question-driven learning path that guides the learner from
Organises interview topics into a navigable graph of interconnected concepts and subtopics.
Bloop is an AI code analysis tool and semantic search engine designed for understanding and querying large-scale codebases. It utilizes a high-performance indexing system written in Rust to enable fast symbol and text retrieval across multiple programming languages. The project differentiates itself by using on-device embeddings for semantic code search, allowing users to locate logic based on meaning and intent rather than exact keywords. It combines a language model with a retrieval-augmented generation approach to provide a natural language interface for conversational querying and the gen
Implements a unified map that resolves symbols and references across different programming languages in a single codebase.
Spring AI is an application framework for Java that provides a portable, fluent API for integrating AI models, tools, and vector stores into applications. It wraps multiple AI providers behind a common interface, allowing developers to switch between chat, embedding, image, and speech models without changing application code. The framework includes a chainable chat client API similar to WebClient or RestClient, supports both synchronous and streaming interactions, and offers structured output conversion that transforms unstructured AI responses into strongly-typed Java objects. The framework
Checks if claims in AI responses are factually supported by the given context to catch hallucinations.
This project is a scientific agent framework and workflow orchestrator designed to extend large language models with specialized tools for genomic, chemical, and biological research. It provides a system for planning research hypotheses and executing automated workflows by integrating scientific databases with dynamic code execution. The framework includes a cheminformatics modeling suite for predicting molecular bioactivity and performing virtual screening, alongside a bioinformatics analysis toolkit for processing genomic sequences and single-cell data. It also features an academic document
Maps protein interactions and biological pathways by linking gene lists to structured scientific databases.
Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and a low-bit model quantization tool for converting weights into INT4, FP8, and GGUF formats. The project features a parameter-efficient finetuning framework that enables model adaptation using QLoRA and DPO on Intel hardware. It distinguishes itself by providing specialized optimizations for Intel XP
Provides capabilities for retrieving comprehensive answers from unstructured data using indexed knowledge graphs.
KAG is a graph-augmented retrieval augmented generation system and knowledge graph engine. It functions as a framework that integrates large language models with graph retrieval and numerical calculation to resolve natural language queries. The system creates unified knowledge representations by aligning unstructured data and expert rules through semantic mapping. It maintains mutual indexing between graph structures and original text blocks to ensure that reasoning processes remain linked to verifiable source data. The project provides capabilities for semantic information integration, grap
Links graph structures to original text blocks to enable fast retrieval of source data during reasoning.
ChatLaw is a specialized large language model legal assistant designed to provide automated consulting and question answering within Chinese legal frameworks. It functions as a system for legal knowledge management, processing complex legal texts to deliver accurate statutory answers and advisory services. The system utilizes a mixture-of-experts modeling approach and multi-agent coordination to research information and generate professional consultation reports. To ensure factual reliability and minimize hallucinations, it integrates a legal knowledge graph and a standardized operating proce
Integrates a legal knowledge graph to ground unstructured model outputs in structured legal concepts for factual reliability.
This project is a comprehensive Lisp AI implementation library that provides reference implementations for various artificial intelligence paradigms and symbolic algorithms. It functions as a multi-purpose toolkit containing a logic programming engine, a natural language processing suite, and a symbolic mathematics toolkit. The library is distinguished by its diverse architectural frameworks, including a Prolog-style execution engine that uses unification and goal-driven backtracking, and a system for simulating human decision-making through expert system shells and certainty factors. It also
Implements production rules that map clinical observations and patient attributes to probabilities of specific identities.
Dendron is a markdown knowledge management system designed for organizing linked files into a hierarchical personal knowledge base. It functions as a git-backed note manager that stores data as plaintext markdown files to ensure data persistence and ownership. The system distinguishes itself through schema-based organization, which applies structural templates and autocomplete hints to maintain consistency across large sets of documents. It also provides bi-directional linking and an interactive graph view to visualize relationships between notes, alongside a static site generator that export
Creates reciprocal links between documents to enable navigation via backlinks and graph visualizations.