21 Repos
Frameworks for building interconnected data structures from unstructured text.
Distinguishing note: Focuses on the end-to-end construction of knowledge graphs.
Explore 21 awesome GitHub repositories matching data & databases · Knowledge Graph Construction Tools. Refine with filters or upvote what's useful.
This repository serves as a comprehensive library of architectural blueprints and code examples for integrating large language models into software applications. It functions as a developer learning resource, providing structured tutorials and implementation patterns that demonstrate how to build intelligent features using advanced prompting and data processing techniques. The collection distinguishes itself by focusing on complex reasoning and data-grounding workflows. It provides practical guidance on implementing retrieval-augmented generation pipelines, which connect language models to pr
Extract entities and relationships from raw text to build structured graphs that represent complex information for improved data analysis and visualization.
GraphRAG is a data processing pipeline and retrieval engine designed to transform unstructured text into interconnected knowledge graphs. By utilizing language models to extract entities and relationships, it builds structured representations of information that enable context-aware retrieval for downstream applications. The system distinguishes itself through hierarchical graph clustering and large-scale data synthesis, which organize massive document corpora into multi-level structures. This approach allows for both vector-based semantic searches and graph-based traversals, providing a comp
Transforms unstructured text collections into interconnected data structures to enable deep semantic analysis.
Supermemory is an artificial intelligence memory management platform designed to provide autonomous agents with persistent, long-term knowledge bases. It functions as a centralized repository that synchronizes multimodal data, enabling agents to maintain context and historical information across complex, multi-session workflows. By serving as a knowledge graph engine and vector database orchestrator, the platform ensures that information remains accessible and relevant for automated tasks. The system distinguishes itself through its hybrid indexing approach, which combines vector similarity s
Constructs semantic relationships between data points to enable complex retrieval across historical project context.
Graphiti is a backend framework and memory server designed to provide artificial intelligence agents with persistent, time-aware knowledge graph storage. It functions as a memory layer that enables agents to maintain context across long-term interactions by recording and evolving structured data over time. The system distinguishes itself through a specialized temporal graph database that tracks how entities and relationships change using validity windows. By combining semantic vector similarity, keyword matching, and graph topology traversal, the engine performs hybrid retrieval to locate rel
Builds and maintains evolving temporal knowledge graphs using validity windows and incremental updates.
This project is a recommendation system framework designed for building, evaluating, and operationalizing personalized item suggestion engines. It provides a comprehensive toolkit for implementing collaborative filtering and content-based algorithms, supported by an end-to-end machine learning pipeline for preparing datasets and deploying predictive models. The framework distinguishes itself through the integration of knowledge graphs to provide richer context for recommendations and the use of industry-specific patterns to accelerate system deployment. It also includes a specialized model ev
Includes frameworks for constructing structured knowledge graphs from external data to enrich recommendation context.
Cognee is an agentic memory management platform designed to provide autonomous agents with long-term semantic recall and structured knowledge. It functions as a framework for building persistent memory systems that connect large language models to graph-based knowledge and vector storage, enabling agents to maintain context across complex tasks and multiple sessions. The platform distinguishes itself through a hybrid approach that combines semantic similarity search with structural graph traversal, allowing for context-aware information retrieval. It features a modular architecture that orche
Transforms unstructured data into interconnected, queryable knowledge representations for improved semantic understanding.
This project is a comprehensive framework for developing, orchestrating, and deploying autonomous agents. It provides a structured environment for building agents that utilize reasoning loops to perform multi-step tasks, manage state through graph-based workflows, and interact with external tools. By mapping unstructured model outputs into typed schemas, the framework ensures reliable integration with downstream application logic. The platform distinguishes itself through a focus on production-grade reliability and security. It incorporates hybrid memory systems that combine vector embeddings
Constructs knowledge graphs from disparate data sources to map relationships between information.
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
Implements automated construction of knowledge graphs from unstructured documents to enable complex data querying.
This project is a multi-model database system designed to store and manage information as documents, graphs, and key-value pairs within a single engine. It functions as a graph database and knowledge graph platform, providing the infrastructure to build, query, and visualize structured data models. By integrating vector search capabilities, the system serves as a vector database that supports retrieval-augmented generation for artificial intelligence applications. The platform distinguishes itself through a unified query language that allows users to perform document lookups, graph traversals
Builds structured maps of entities and relationships from raw data to provide reliable context for intelligent systems.
Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against defined rubrics and reference data. By standardizing inputs, retrieved contexts, and generated responses into a unified schema, the project enables consistent analysis across complex AI applications. The framework distinguishes itself through its ability to generate synthetic test datasets from existin
Structures information from documents into graph formats to represent relationships for advanced analysis.
KAG is a graph-augmented retrieval augmented generation system and knowledge graph engine. It functions as a framework that integrates large language models with graph retrieval and numerical calculation to resolve natural language queries. The system creates unified knowledge representations by aligning unstructured data and expert rules through semantic mapping. It maintains mutual indexing between graph structures and original text blocks to ensure that reasoning processes remain linked to verifiable source data. The project provides capabilities for semantic information integration, grap
Integrates unstructured data and expert rules via semantic alignment to build comprehensive knowledge bases.
QASystemOnMedicalKG is a medical knowledge graph question answering system designed to retrieve disease-centered information from a structured data store. It functions as both a constructor for building medical knowledge graphs and a retrieval system that extracts answers regarding symptoms, causes, and treatments. The system employs a pipeline that converts unstructured medical web data into a graph database using dictionary-based entity segmentation. It utilizes query-based intent classification to parse natural language inputs and maps these queries to specific nodes and edges within the g
Offers a framework for building interconnected medical data structures from unstructured web text.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
Builds structured knowledge graphs from documents for AI agent reasoning.
llm-graph-builder ist ein Tool zur Transformation unstrukturierter Daten in strukturierte Neo4j-Graph-Datenbanken unter Verwendung großer Sprachmodelle. Es fungiert als Graph-Orchestrator, der die Konstruktion von Knoten und Beziehungen aus Rohtext basierend auf benutzerdefinierten Schemata automatisiert. Das Projekt bietet einen Visualizer zur Analyse relationaler Daten als interaktive Netzwerke und einen Token-Monitor, um den täglichen und monatlichen API-Verbrauch pro Benutzer zu verfolgen. Es enthält zudem einen Vektor-Embedding-Generator, der konfigurierbare Modellanbieter nutzt, um semantische Suche und Retrieval-Augmented-Generation zu ermöglichen. Das System deckt Funktionen für die Analyse unstrukturierter Daten, konversationsbasierte Datenabfrage über natürliche Sprachinterfaces und semantische Indizierung mittels Vektor-Embeddings ab.
Transforms unstructured data into structured knowledge graphs using large language models and custom schemas.
OpenNRE ist eine NLP-Bibliothek und ein Framework für neuronale Relationsextraktion, das darauf ausgelegt ist, unstrukturierte Texte in strukturierte relationale Daten umzuwandeln. Es dient als Toolkit zur Identifizierung von Beziehungstypen zwischen Entitäten und zur Generierung von Entität-Relation-Entität-Tripeln zur Befüllung und Erweiterung von Wissensdatenbanken. Das Framework bietet Tools für sowohl überwachte als auch distanziert überwachte Relationsextraktion, was es ermöglicht, neuronale Modelle auf gelabelten Datensätzen oder über automatisierte Pipelines zu trainieren, die Wissensdatenbank-Tripel mit Rohtext abgleichen. Das Projekt deckt eine vollständige Informationsextraktions-Pipeline ab, einschließlich Transformer-basierter Textkodierung, Relationsinferenz und der Ausgabe strukturierter Tripel für den Aufbau von Wissensgraphen.
Turns raw text into a network of entities and relations to build structured knowledge graphs.
Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution. The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementa
Extracts entities and relationships from unstructured documents using LLMs and stores them as graph structures.
DeepKE ist ein Toolkit und Framework zur Wissensextraktion, das darauf ausgelegt ist, unstrukturierte Texte in strukturierte Wissensgraphen zu transformieren. Es bietet eine Pipeline zur Identifizierung und Klassifizierung benannter Entitäten, semantischer Beziehungen und Ereignisse und konvertiert rohe Datensätze in strukturierte Tripel. Das Projekt nutzt Large Language Models als Tool-Caller durch ein standardisiertes Kontextprotokoll, um automatisierte Datenextraktionsprozesse voranzutreiben. Es unterstützt schema-gesteuerte Extraktion über mehrere Domänen und zweisprachige Texte hinweg und verwendet gemeinsame Entitäts- und Beziehungsextraktion, um Komponenten in einer einzigen strukturierten Ausgabe zu identifizieren. Das Toolkit umfasst Funktionen für Modelltraining und Fine-Tuning, Hyperparameter-Optimierung und Datenvorbereitung via Distant Supervision und automatisierter Beziehungslabeling. Es bietet zudem verteiltes GPU-Training, Modell-Speicheroptimierung durch Quantisierung und die Möglichkeit, trainierte Modelle als Inference-Services über API-Endpunkte bereitzustellen.
Provides a comprehensive framework for building structured knowledge graphs by extracting entities, relations, and events from text.
Agriculture Knowledge Graph is a structured triple-store system and decision support platform designed to transform raw agricultural documents into a machine-readable graph. It functions as a domain information retrieval system that extracts and queries agricultural data to provide intelligent answers and planning support. The project implements a full pipeline for knowledge graph construction, featuring a relation extraction framework and named entity recognition tools. It utilizes remote supervision and machine learning to identify and classify relationships between entities, converting uns
Transforms unstructured agricultural text into a structured knowledge graph by extracting entities and relationships.
Memgraph is an in-memory, distributed graph database designed for high-performance labeled property graph management. It utilizes a Cypher query engine for declarative data retrieval and manipulation, providing a scalable knowledge graph backend that integrates vector search and graph traversals. The system distinguishes itself as a real-time graph analytics platform, employing native C++ and CUDA implementations to execute complex network analysis and dynamic community detection on streaming data. It provides specialized support for AI integration, including GraphRAG capabilities, the constr
Transforms raw documents into connected knowledge graphs by extracting entities and generating embeddings.
This project is a library of reference implementations and blueprints for deploying large language models and generative AI workflows. It provides a collection of practical examples designed to guide the deployment of generative systems. The repository features architectural patterns for autonomous agentic workflows that utilize reasoning and tool integration to execute multi-step tasks. It also includes frameworks and templates for building retrieval-augmented generation pipelines that connect language models to vector databases and external data sources. The codebase covers several functio
Provides frameworks for building interconnected relational data structures to enhance information retrieval accuracy.