21 repositorios
Frameworks for building interconnected data structures from unstructured text.
Distinguishing note: Focuses on the end-to-end construction of knowledge graphs.
Explore 21 awesome GitHub repositories matching data & databases · Knowledge Graph Construction Tools. Refine with filters or upvote what's useful.
This repository serves as a comprehensive library of architectural blueprints and code examples for integrating large language models into software applications. It functions as a developer learning resource, providing structured tutorials and implementation patterns that demonstrate how to build intelligent features using advanced prompting and data processing techniques. The collection distinguishes itself by focusing on complex reasoning and data-grounding workflows. It provides practical guidance on implementing retrieval-augmented generation pipelines, which connect language models to pr
Extract entities and relationships from raw text to build structured graphs that represent complex information for improved data analysis and visualization.
GraphRAG is a data processing pipeline and retrieval engine designed to transform unstructured text into interconnected knowledge graphs. By utilizing language models to extract entities and relationships, it builds structured representations of information that enable context-aware retrieval for downstream applications. The system distinguishes itself through hierarchical graph clustering and large-scale data synthesis, which organize massive document corpora into multi-level structures. This approach allows for both vector-based semantic searches and graph-based traversals, providing a comp
Transforms unstructured text collections into interconnected data structures to enable deep semantic analysis.
Supermemory is an artificial intelligence memory management platform designed to provide autonomous agents with persistent, long-term knowledge bases. It functions as a centralized repository that synchronizes multimodal data, enabling agents to maintain context and historical information across complex, multi-session workflows. By serving as a knowledge graph engine and vector database orchestrator, the platform ensures that information remains accessible and relevant for automated tasks. The system distinguishes itself through its hybrid indexing approach, which combines vector similarity s
Constructs semantic relationships between data points to enable complex retrieval across historical project context.
Graphiti is a backend framework and memory server designed to provide artificial intelligence agents with persistent, time-aware knowledge graph storage. It functions as a memory layer that enables agents to maintain context across long-term interactions by recording and evolving structured data over time. The system distinguishes itself through a specialized temporal graph database that tracks how entities and relationships change using validity windows. By combining semantic vector similarity, keyword matching, and graph topology traversal, the engine performs hybrid retrieval to locate rel
Builds and maintains evolving temporal knowledge graphs using validity windows and incremental updates.
This project is a recommendation system framework designed for building, evaluating, and operationalizing personalized item suggestion engines. It provides a comprehensive toolkit for implementing collaborative filtering and content-based algorithms, supported by an end-to-end machine learning pipeline for preparing datasets and deploying predictive models. The framework distinguishes itself through the integration of knowledge graphs to provide richer context for recommendations and the use of industry-specific patterns to accelerate system deployment. It also includes a specialized model ev
Includes frameworks for constructing structured knowledge graphs from external data to enrich recommendation context.
Cognee is an agentic memory management platform designed to provide autonomous agents with long-term semantic recall and structured knowledge. It functions as a framework for building persistent memory systems that connect large language models to graph-based knowledge and vector storage, enabling agents to maintain context across complex tasks and multiple sessions. The platform distinguishes itself through a hybrid approach that combines semantic similarity search with structural graph traversal, allowing for context-aware information retrieval. It features a modular architecture that orche
Transforms unstructured data into interconnected, queryable knowledge representations for improved semantic understanding.
This project is a comprehensive framework for developing, orchestrating, and deploying autonomous agents. It provides a structured environment for building agents that utilize reasoning loops to perform multi-step tasks, manage state through graph-based workflows, and interact with external tools. By mapping unstructured model outputs into typed schemas, the framework ensures reliable integration with downstream application logic. The platform distinguishes itself through a focus on production-grade reliability and security. It incorporates hybrid memory systems that combine vector embeddings
Constructs knowledge graphs from disparate data sources to map relationships between information.
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
Implements automated construction of knowledge graphs from unstructured documents to enable complex data querying.
This project is a multi-model database system designed to store and manage information as documents, graphs, and key-value pairs within a single engine. It functions as a graph database and knowledge graph platform, providing the infrastructure to build, query, and visualize structured data models. By integrating vector search capabilities, the system serves as a vector database that supports retrieval-augmented generation for artificial intelligence applications. The platform distinguishes itself through a unified query language that allows users to perform document lookups, graph traversals
Builds structured maps of entities and relationships from raw data to provide reliable context for intelligent systems.
Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against defined rubrics and reference data. By standardizing inputs, retrieved contexts, and generated responses into a unified schema, the project enables consistent analysis across complex AI applications. The framework distinguishes itself through its ability to generate synthetic test datasets from existin
Structures information from documents into graph formats to represent relationships for advanced analysis.
KAG is a graph-augmented retrieval augmented generation system and knowledge graph engine. It functions as a framework that integrates large language models with graph retrieval and numerical calculation to resolve natural language queries. The system creates unified knowledge representations by aligning unstructured data and expert rules through semantic mapping. It maintains mutual indexing between graph structures and original text blocks to ensure that reasoning processes remain linked to verifiable source data. The project provides capabilities for semantic information integration, grap
Integrates unstructured data and expert rules via semantic alignment to build comprehensive knowledge bases.
QASystemOnMedicalKG is a medical knowledge graph question answering system designed to retrieve disease-centered information from a structured data store. It functions as both a constructor for building medical knowledge graphs and a retrieval system that extracts answers regarding symptoms, causes, and treatments. The system employs a pipeline that converts unstructured medical web data into a graph database using dictionary-based entity segmentation. It utilizes query-based intent classification to parse natural language inputs and maps these queries to specific nodes and edges within the g
Offers a framework for building interconnected medical data structures from unstructured web text.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
Builds structured knowledge graphs from documents for AI agent reasoning.
llm-graph-builder es una herramienta para transformar datos no estructurados en bases de datos de grafos Neo4j estructuradas utilizando modelos de lenguaje grandes. Funciona como un orquestador de grafos que automatiza la construcción de nodos y relaciones a partir de texto sin formato basado en esquemas personalizados. El proyecto proporciona un visualizador para analizar datos relacionales como redes interactivas y un monitor de tokens para rastrear el consumo diario y mensual de API por usuario. También incluye un generador de embeddings vectoriales que utiliza proveedores de modelos configurables para permitir la búsqueda semántica y la generación aumentada por recuperación (RAG). El sistema cubre capacidades para el análisis de datos no estructurados, recuperación de datos conversacionales mediante interfaces de lenguaje natural e indexación semántica mediante embeddings vectoriales.
Transforms unstructured data into structured knowledge graphs using large language models and custom schemas.
OpenNRE es una librería de procesamiento de lenguaje natural y un framework de extracción de relaciones neuronales diseñado para transformar texto no estructurado en datos relacionales estructurados. Sirve como un kit de herramientas para identificar tipos de relaciones entre entidades y generar triples entidad-relación-entidad para poblar y expandir bases de conocimiento. El framework proporciona herramientas tanto para la extracción de relaciones supervisada como supervisada a distancia, permitiendo que los modelos neuronales se entrenen en datasets etiquetados o mediante pipelines automatizados que alinean triples de bases de conocimiento con texto sin procesar. El proyecto cubre un pipeline completo de extracción de información, incluyendo codificación de texto basada en transformer, inferencia de relaciones y la salida de triples estructurados para la construcción de grafos de conocimiento.
Turns raw text into a network of entities and relations to build structured knowledge graphs.
Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution. The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementa
Extracts entities and relationships from unstructured documents using LLMs and stores them as graph structures.
DeepKE es un kit de herramientas y framework de extracción de conocimiento diseñado para transformar texto no estructurado en grafos de conocimiento estructurados. Proporciona una tubería para identificar y clasificar entidades nombradas, relaciones semánticas y eventos, convirtiendo conjuntos de datos crudos en triples estructurados. El proyecto utiliza modelos de lenguaje grandes como llamadores de herramientas a través de un protocolo de contexto estandarizado para impulsar procesos automatizados de extracción de datos. Admite la extracción basada en esquemas en múltiples dominios y texto bilingüe, empleando la extracción conjunta de entidades y relaciones para identificar componentes en una única salida estructurada. El kit de herramientas incluye capacidades para el entrenamiento y ajuste fino de modelos, optimización de hiperparámetros y preparación de datos mediante supervisión distante y etiquetado automático de relaciones. También cuenta con entrenamiento distribuido en GPU, optimización de memoria de modelos mediante cuantización y la capacidad de desplegar modelos entrenados como servicios de inferencia a través de endpoints de API.
Provides a comprehensive framework for building structured knowledge graphs by extracting entities, relations, and events from text.
Agriculture Knowledge Graph es un sistema de triple-store estructurado y plataforma de soporte a la decisión diseñado para transformar documentos agrícolas crudos en un grafo legible por máquina. Funciona como un sistema de recuperación de información de dominio que extrae y consulta datos agrícolas para proporcionar respuestas inteligentes y soporte a la planificación. El proyecto implementa un pipeline completo para la construcción de grafos de conocimiento, contando con un framework de extracción de relaciones y herramientas de reconocimiento de entidades nombradas. Utiliza supervisión remota y aprendizaje automático para identificar y clasificar relaciones entre entidades, convirtiendo texto no estructurado en una red de hechos y dependencias. El sistema proporciona capacidades para la recuperación de información del dominio agrícola mediante análisis de rutas basado en grafos y mapeo de taxonomía jerárquica. Permite a los usuarios identificar entidades específicas del sujeto, extraer relaciones de dominio y consultar el grafo de conocimiento para descubrir conexiones entre nodos.
Transforms unstructured agricultural text into a structured knowledge graph by extracting entities and relationships.
Memgraph is an in-memory, distributed graph database designed for high-performance labeled property graph management. It utilizes a Cypher query engine for declarative data retrieval and manipulation, providing a scalable knowledge graph backend that integrates vector search and graph traversals. The system distinguishes itself as a real-time graph analytics platform, employing native C++ and CUDA implementations to execute complex network analysis and dynamic community detection on streaming data. It provides specialized support for AI integration, including GraphRAG capabilities, the constr
Transforms raw documents into connected knowledge graphs by extracting entities and generating embeddings.
This project is a library of reference implementations and blueprints for deploying large language models and generative AI workflows. It provides a collection of practical examples designed to guide the deployment of generative systems. The repository features architectural patterns for autonomous agentic workflows that utilize reasoning and tool integration to execute multi-step tasks. It also includes frameworks and templates for building retrieval-augmented generation pipelines that connect language models to vector databases and external data sources. The codebase covers several functio
Provides frameworks for building interconnected relational data structures to enhance information retrieval accuracy.