21 مستودعات
Frameworks for building interconnected data structures from unstructured text.
Distinguishing note: Focuses on the end-to-end construction of knowledge graphs.
Explore 21 awesome GitHub repositories matching data & databases · Knowledge Graph Construction Tools. Refine with filters or upvote what's useful.
This repository serves as a comprehensive library of architectural blueprints and code examples for integrating large language models into software applications. It functions as a developer learning resource, providing structured tutorials and implementation patterns that demonstrate how to build intelligent features using advanced prompting and data processing techniques. The collection distinguishes itself by focusing on complex reasoning and data-grounding workflows. It provides practical guidance on implementing retrieval-augmented generation pipelines, which connect language models to pr
Extract entities and relationships from raw text to build structured graphs that represent complex information for improved data analysis and visualization.
GraphRAG is a data processing pipeline and retrieval engine designed to transform unstructured text into interconnected knowledge graphs. By utilizing language models to extract entities and relationships, it builds structured representations of information that enable context-aware retrieval for downstream applications. The system distinguishes itself through hierarchical graph clustering and large-scale data synthesis, which organize massive document corpora into multi-level structures. This approach allows for both vector-based semantic searches and graph-based traversals, providing a comp
Transforms unstructured text collections into interconnected data structures to enable deep semantic analysis.
Supermemory is an artificial intelligence memory management platform designed to provide autonomous agents with persistent, long-term knowledge bases. It functions as a centralized repository that synchronizes multimodal data, enabling agents to maintain context and historical information across complex, multi-session workflows. By serving as a knowledge graph engine and vector database orchestrator, the platform ensures that information remains accessible and relevant for automated tasks. The system distinguishes itself through its hybrid indexing approach, which combines vector similarity s
Constructs semantic relationships between data points to enable complex retrieval across historical project context.
Graphiti is a backend framework and memory server designed to provide artificial intelligence agents with persistent, time-aware knowledge graph storage. It functions as a memory layer that enables agents to maintain context across long-term interactions by recording and evolving structured data over time. The system distinguishes itself through a specialized temporal graph database that tracks how entities and relationships change using validity windows. By combining semantic vector similarity, keyword matching, and graph topology traversal, the engine performs hybrid retrieval to locate rel
Builds and maintains evolving temporal knowledge graphs using validity windows and incremental updates.
This project is a recommendation system framework designed for building, evaluating, and operationalizing personalized item suggestion engines. It provides a comprehensive toolkit for implementing collaborative filtering and content-based algorithms, supported by an end-to-end machine learning pipeline for preparing datasets and deploying predictive models. The framework distinguishes itself through the integration of knowledge graphs to provide richer context for recommendations and the use of industry-specific patterns to accelerate system deployment. It also includes a specialized model ev
Includes frameworks for constructing structured knowledge graphs from external data to enrich recommendation context.
Cognee is an agentic memory management platform designed to provide autonomous agents with long-term semantic recall and structured knowledge. It functions as a framework for building persistent memory systems that connect large language models to graph-based knowledge and vector storage, enabling agents to maintain context across complex tasks and multiple sessions. The platform distinguishes itself through a hybrid approach that combines semantic similarity search with structural graph traversal, allowing for context-aware information retrieval. It features a modular architecture that orche
Transforms unstructured data into interconnected, queryable knowledge representations for improved semantic understanding.
This project is a comprehensive framework for developing, orchestrating, and deploying autonomous agents. It provides a structured environment for building agents that utilize reasoning loops to perform multi-step tasks, manage state through graph-based workflows, and interact with external tools. By mapping unstructured model outputs into typed schemas, the framework ensures reliable integration with downstream application logic. The platform distinguishes itself through a focus on production-grade reliability and security. It incorporates hybrid memory systems that combine vector embeddings
Constructs knowledge graphs from disparate data sources to map relationships between information.
This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer. The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
Implements automated construction of knowledge graphs from unstructured documents to enable complex data querying.
This project is a multi-model database system designed to store and manage information as documents, graphs, and key-value pairs within a single engine. It functions as a graph database and knowledge graph platform, providing the infrastructure to build, query, and visualize structured data models. By integrating vector search capabilities, the system serves as a vector database that supports retrieval-augmented generation for artificial intelligence applications. The platform distinguishes itself through a unified query language that allows users to perform document lookups, graph traversals
Builds structured maps of entities and relationships from raw data to provide reliable context for intelligent systems.
Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against defined rubrics and reference data. By standardizing inputs, retrieved contexts, and generated responses into a unified schema, the project enables consistent analysis across complex AI applications. The framework distinguishes itself through its ability to generate synthetic test datasets from existin
Structures information from documents into graph formats to represent relationships for advanced analysis.
KAG is a graph-augmented retrieval augmented generation system and knowledge graph engine. It functions as a framework that integrates large language models with graph retrieval and numerical calculation to resolve natural language queries. The system creates unified knowledge representations by aligning unstructured data and expert rules through semantic mapping. It maintains mutual indexing between graph structures and original text blocks to ensure that reasoning processes remain linked to verifiable source data. The project provides capabilities for semantic information integration, grap
Integrates unstructured data and expert rules via semantic alignment to build comprehensive knowledge bases.
QASystemOnMedicalKG is a medical knowledge graph question answering system designed to retrieve disease-centered information from a structured data store. It functions as both a constructor for building medical knowledge graphs and a retrieval system that extracts answers regarding symptoms, causes, and treatments. The system employs a pipeline that converts unstructured medical web data into a graph database using dictionary-based entity segmentation. It utilizes query-based intent classification to parse natural language inputs and maps these queries to specific nodes and edges within the g
Offers a framework for building interconnected medical data structures from unstructured web text.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
Builds structured knowledge graphs from documents for AI agent reasoning.
llm-graph-builder is a tool for transforming unstructured data into structured Neo4j graph databases using large language models. It functions as a graph orchestrator that automates the construction of nodes and relationships from raw text based on custom schemas. The project provides a visualizer for analyzing relational data as interactive networks and a token monitor to track daily and monthly API consumption per user. It also includes a vector embedding generator that utilizes configurable model providers to enable semantic search and retrieval augmented generation. The system covers cap
Transforms unstructured data into structured knowledge graphs using large language models and custom schemas.
OpenNRE هي مكتبة معالجة لغة طبيعية وإطار عمل لاستخراج العلاقات العصبية مصمم لتحويل النص غير المهيكل إلى بيانات علائقية مهيكلة. تعمل كمجموعة أدوات لتحديد أنواع العلاقات بين الكيانات وتوليد ثلاثيات كيان-علاقة-كيان لملء وتوسيع قواعد المعرفة. يوفر إطار العمل أدوات لاستخراج العلاقات الخاضع للإشراف والخاضع للإشراف عن بعد، مما يسمح بتدريب النماذج العصبية على مجموعات بيانات مصنفة أو عبر خطوط أنابيب آلية تتماشى مع ثلاثيات قاعدة المعرفة مع النص الخام. يغطي المشروع خط أنابيب استخراج معلومات كاملاً، بما في ذلك ترميز النص القائم على المحول، واستدلال العلاقة، وإخراج ثلاثيات مهيكلة لبناء رسم بياني للمعرفة.
Turns raw text into a network of entities and relations to build structured knowledge graphs.
Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution. The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementa
Extracts entities and relationships from unstructured documents using LLMs and stores them as graph structures.
DeepKE هو مجموعة أدوات وإطار عمل لاستخراج المعرفة مصمم لتحويل النص غير المنظم إلى رسوم بيانية معرفية منظمة. يوفر خط أنابيب لتحديد وتصنيف الكيانات المسماة، والعلاقات الدلالية، والأحداث، وتحويل مجموعات البيانات الخام إلى ثلاثيات منظمة. يستخدم المشروع نماذج لغة كبيرة كمتصلين للأدوات من خلال بروتوكول سياق موحد لدفع عمليات استخراج البيانات الآلية. يدعم الاستخراج القائم على المخطط عبر مجالات متعددة والنص ثنائي اللغة، مستخدماً استخراج الكيان والعلاقة المشترك لتحديد المكونات في مخرج منظم واحد. تتضمن مجموعة الأدوات قدرات لتدريب النموذج وضبطه، وتحسين المعلمات الفائقة، وإعداد البيانات عبر الإشراف البعيد وتسمية العلاقات الآلية. كما يتميز بتدريب GPU الموزع، وتحسين ذاكرة النموذج من خلال التكميم، والقدرة على نشر النماذج المدربة كخدمات استدلال عبر نقاط نهاية API.
Provides a comprehensive framework for building structured knowledge graphs by extracting entities, relations, and events from text.
Agriculture Knowledge Graph هو نظام مخزن ثلاثي مهيكل ومنصة دعم قرار مصممة لتحويل المستندات الزراعية الخام إلى رسم بياني قابل للقراءة آلياً. يعمل كنظام استرجاع معلومات المجال الذي يستخرج ويستعلم عن البيانات الزراعية لتقديم إجابات ذكية ودعم التخطيط. ينفذ المشروع خط أنابيب كاملاً لبناء الرسم البياني المعرفي، ويتميز بإطار عمل لاستخراج العلاقات وأدوات التعرف على الكيانات المسماة. يستخدم الإشراف عن بُعد والتعلم الآلي لتحديد وتصنيف العلاقات بين الكيانات، وتحويل النص غير المهيكل إلى شبكة من الحقائق والتبعيات. يوفر النظام قدرات لاسترجاع معلومات المجال الزراعي من خلال تحليل المسار القائم على الرسم البياني وتعيين التصنيف الهرمي. يسمح للمستخدمين بتحديد الكيانات الخاصة بالموضوع، واستخراج علاقات المجال، والاستعلام عن الرسم البياني المعرفي للكشف عن الروابط بين العقد.
Transforms unstructured agricultural text into a structured knowledge graph by extracting entities and relationships.
Memgraph is an in-memory, distributed graph database designed for high-performance labeled property graph management. It utilizes a Cypher query engine for declarative data retrieval and manipulation, providing a scalable knowledge graph backend that integrates vector search and graph traversals. The system distinguishes itself as a real-time graph analytics platform, employing native C++ and CUDA implementations to execute complex network analysis and dynamic community detection on streaming data. It provides specialized support for AI integration, including GraphRAG capabilities, the constr
Transforms raw documents into connected knowledge graphs by extracting entities and generating embeddings.
This project is a library of reference implementations and blueprints for deploying large language models and generative AI workflows. It provides a collection of practical examples designed to guide the deployment of generative systems. The repository features architectural patterns for autonomous agentic workflows that utilize reasoning and tool integration to execute multi-step tasks. It also includes frameworks and templates for building retrieval-augmented generation pipelines that connect language models to vector databases and external data sources. The codebase covers several functio
Provides frameworks for building interconnected relational data structures to enhance information retrieval accuracy.