Open-source platforms for discovering, cataloging, and documenting distributed datasets across complex organizational data infrastructures.
OpenMetadata is an enterprise data catalog, metadata platform, and governance suite that functions as a knowledge graph for data assets. It serves as an AI-ready metadata layer, providing governed context and organizational memory to large language model agents via the Model Context Protocol. The platform distinguishes itself by capturing institutional knowledge, linking conversations, decisions, and remediation notes directly to data assets to preserve tribal knowledge. It integrates AI agents to automate metadata governance, such as suggesting descriptions and identifying sensitive data thr
OpenMetadata is a comprehensive data catalog and governance platform that provides automated metadata extraction, lineage visualization, a business glossary, and data quality monitoring within an API-first architecture.
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
DataHub is a comprehensive metadata management platform that provides automated lineage, a business glossary, data quality monitoring, and role-based access control, making it a flagship solution for organizational data governance and discovery.
Amundsen is a data catalog and discovery platform that provides a centralized directory for indexing tables and dashboards. It functions as a metadata management system and search engine, allowing users to locate and understand available data assets across diverse distributed sources. The platform includes capabilities for data lineage tracking to map the origin and movement of datasets between systems. It also serves as a data profiling tool, calculating distribution and quality statistics for individual table columns to provide automated insights into the nature of the data. The system man
Amundsen is a comprehensive data catalog and discovery platform that provides automated metadata extraction, lineage visualization, and API-first integration, making it a flagship solution for managing organizational data assets.
CKAN is an open-source data management platform that provides the foundation for building data portals. It supports the full lifecycle of datasets—from creation and organization to publishing, cataloging with faceted search, and interactive data visualization—all through a web interface. The platform is built on a modular architecture that includes a plugin-based extensibility system, a harvesting framework for importing metadata from external sources, and a standardized RESTful JSON API for programmatic access to datasets and metadata. The web interface is rendered using the Jinja2 templatin
CKAN is a robust data portal and cataloging platform that provides essential metadata management and discovery features, though it is primarily designed for public data publishing rather than internal enterprise data governance.
This is a role-based access control system for Laravel applications that manages user permissions and roles within a database. It provides a database permissions manager to assign specific abilities to users and roles, utilizing authorization gates to restrict access to routes and interface elements. The project features a wildcard permission system that uses pattern matching to grant broad access across multiple related permissions. It also supports team-scoped access control, allowing users to maintain different roles and permission levels across separate organizational contexts or teams.
This is a role-based access control library for Laravel applications, which provides a single security component rather than the comprehensive data catalog and metadata management platform you are looking for.
Cube is a semantic layer data platform that maps raw SQL databases to standardized business metrics and dimensions. It functions as a SQL dialect translator, converting abstract semantic queries into optimized SQL statements for various cloud data warehouses. The platform operates as a multi-tenant data gateway, isolating information and security permissions for different customers within a single deployment. It includes a relational caching engine that stores pre-aggregated query results to reduce latency and decrease the load on primary data warehouses. The system provides a REST-based int
Cube is a semantic layer and metrics API designed for building data applications, rather than a data governance platform for cataloging and documenting organizational metadata.
This project is a business intelligence suite and SQL data visualization platform used for data analysis, reporting, and monitoring. It provides a web application for exploring datasets and building interactive dashboards, complemented by a web-based SQL query editor for analyzing raw data from connected stores. The platform features a semantic data layer to define standardized metrics and dimensions, ensuring consistent data interpretation across reports. It includes a security framework with role-based access control to manage user permissions and authentication across shared dashboards. T
This is a business intelligence and data visualization platform designed for reporting and analysis rather than a dedicated data catalog for indexing and documenting organizational metadata.
ERPNext is a comprehensive enterprise resource planning suite designed to integrate core organizational functions, including accounting, inventory, human resources, and project management, into a single unified platform. It operates as a metadata-driven business application, where data structures and application logic are defined through configuration rather than hard-coded programming to facilitate rapid customization. The system distinguishes itself through a robust security and governance framework that enforces granular, role-based access control across all document operations. It feature
This is a comprehensive enterprise resource planning suite for managing business operations rather than a dedicated data cataloging platform for indexing and documenting metadata across external organizational datasets.