21 open-source projects similar to magda-io/magda, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Magda alternative.
DataHub is a metadata management system and data catalog platform designed to provide a centralized directory for discovering, managing, and documenting datasets across a diverse data stack. It serves as a comprehensive framework for metadata management, incorporating a data governance framework to classify sensitive information and assign ownership for organizational accountability. The platform distinguishes itself through AI-enabled data discovery, which connects large language models to a metadata graph to allow for natural language search and exploration of data assets. It also provides
CKAN is an open-source data management platform that provides the foundation for building data portals. It supports the full lifecycle of datasets—from creation and organization to publishing, cataloging with faceted search, and interactive data visualization—all through a web interface. The platform is built on a modular architecture that includes a plugin-based extensibility system, a harvesting framework for importing metadata from external sources, and a standardized RESTful JSON API for programmatic access to datasets and metadata. The web interface is rendered using the Jinja2 templatin
Gravitino is a federated metadata lake and unified data catalog designed to manage tables, files, and AI models across diverse data sources and cloud storage. It serves as a centralized interface for governing schemas, access controls, and tagging across relational databases, messaging queues, and object stores. The project distinguishes itself by unifying the management of AI assets, such as machine learning models and their version lineages, alongside traditional tabular data. It also implements the Iceberg REST specification to provide a standardized metadata server and proxy for lakehouse
Apache Hamilton — portable & expressive data transformation DAGs
Apollo is a microservice configuration management system and dynamic configuration center. It serves as a centralized platform for storing, distributing, and syncing application settings across distributed environments to maintain consistency across various clusters. The system distinguishes itself through a dynamic configuration orchestrator that supports real-time updates to connected applications, eliminating the need for manual service restarts. It features a grayscale configuration deployment tool for rolling out changes to a small subset of service instances and a version control system
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
Elementary OSS: dbt-native data observability
Amundsen is a data catalog and discovery platform that provides a centralized directory for indexing tables and dashboards. It functions as a metadata management system and search engine, allowing users to locate and understand available data assets across diverse distributed sources. The platform includes capabilities for data lineage tracking to map the origin and movement of datasets between systems. It also serves as a data profiling tool, calculating distribution and quality statistics for individual table columns to provide automated insights into the nature of the data. The system man
Apache Atlas - Open Metadata Management and Governance capabilities across the Hadoop platform and beyond
Collect, aggregate, and visualize a data ecosystem's metadata
Metacat is a unified metadata exploration API service. You can explore Hive, RDS, Teradata, Redshift, S3 and Cassandra. Metacat provides you information about what data you have, where it resides and how to process it. Metadata in the end is really data about the data. So the primary purpose of…
A Microservice Toolkit from The New York Times
Egeria provides the Apache-2.0 licensed open metadata and governance type system, frameworks, APIs, event payloads and interchange protocols to enable tools, engines and platforms to exchange metadata in order to get the best value from data, whilst ensuring it is properly governed.
OpenMetadata is an enterprise data catalog, metadata platform, and governance suite that functions as a knowledge graph for data assets. It serves as an AI-ready metadata layer, providing governed context and organizational memory to large language model agents via the Model Context Protocol. The platform distinguishes itself by capturing institutional knowledge, linking conversations, decisions, and remediation notes directly to data assets to preserve tribal knowledge. It integrates AI agents to automate metadata governance, such as suggesting descriptions and identifying sensitive data thr
Next-Gen Data Discovery and Data Observability Platform
This project is a comprehensive reference guide and cheat sheet for the Docker CLI. It provides a structured collection of commands and documentation to help users manage container lifecycles, build images, and handle registries. The documentation specifically covers the orchestration of multi-container applications using Docker Compose and the management of scalable services across multiple nodes via Docker Swarm. It also includes detailed guides for configuring virtual networks, bridges, and ports to control container communication. The reference surface extends to container image administ