Why is eugeneyan/applied-ml a recommended Data Discovery Tools GitHub Repositories repository?

Locate and explore available datasets using specialized tools designed to catalog, index, and search for information across distributed storage systems.

Why is cube-js/cube a recommended Data Discovery Tools GitHub Repositories repository?

Exposes data model structures through a programmatic interface to simplify client-side integration.

Why is oxnr/awesome-bigdata a recommended Data Discovery Tools GitHub Repositories repository?

Helps identify and evaluate databases and processing frameworks for large-scale data infrastructure.

Why is linkedin/datahub a recommended Data Discovery Tools GitHub Repositories repository?

Integrates large language models with a metadata graph to enable natural language search and discovery.

Why is datahub-project/datahub a recommended Data Discovery Tools GitHub Repositories repository?

Provides a centralized interface for users to search and access data assets, accelerating the time required to derive insights.

Why is microsoft/mastering-github-copilot-for-paired-programming a recommended Data Discovery Tools GitHub Repositories repository?

Provides workflows for using large language models to discover and explore data assets via natural language.

Why is amundsen-io/amundsen a recommended Data Discovery Tools GitHub Repositories repository?

Implements a searchable interface and index for locating specific datasets across diverse distributed data sources.

Why is samapriya/awesome-gee-community-datasets a recommended Data Discovery Tools GitHub Repositories repository?

Functions as a searchable repository of geographic information systems data for environmental analysis.

8 مستودعات

Awesome GitHub RepositoriesData Discovery Tools

Software for cataloging, indexing, and searching datasets across distributed systems.

Distinguishing note: Focuses on the discovery and exploration of data assets rather than database management or storage.

Explore 8 awesome GitHub repositories matching data & databases · Data Discovery Tools. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

eugeneyan/applied-ml
eugeneyan/applied-ml
29,783عرض على GitHub
This project is a comprehensive, curated knowledge base designed to support the development and maintenance of production-grade machine learning systems. It serves as a centralized repository of industry-standard technical literature, engineering case studies, and research papers, providing a structured reference for practitioners navigating the complexities of modern data science and machine learning engineering. The resource distinguishes itself through a cross-domain approach that bridges the gap between academic research and practical implementation. By synthesizing proven industry archit
Locate and explore available datasets using specialized tools designed to catalog, index, and search for information across distributed storage systems.
applied-data-scienceapplied-machine-learningcomputer-vision
عرض على GitHub29,783
cube-js/cube
cube-js/cube
20,251عرض على GitHub
Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools. The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
Exposes data model structures through a programmatic interface to simplify client-side integration.
Rustagentic-analyticsagentsai
عرض على GitHub20,251
oxnr/awesome-bigdata
oxnr/awesome-bigdata
14,454عرض على GitHub
This project is a curated directory of software, frameworks, and educational resources designed for building, scaling, and maintaining distributed data processing and storage architectures. It serves as a comprehensive index for the distributed computing ecosystem, helping users identify the appropriate tools for managing large-scale information systems. The repository functions as a central hub for data engineering, offering categorized access to technologies that support batch and stream processing, machine learning, and interactive querying. By organizing these resources, it assists in the
Helps identify and evaluate databases and processing frameworks for large-scale data infrastructure.
awesomeawesome-listbigdata
عرض على GitHub14,454
linkedin/datahub
linkedin/datahub
12,106عرض على GitHub
DataHub is a metadata management system and data catalog platform designed to provide a centralized directory for discovering, managing, and documenting datasets across a diverse data stack. It serves as a comprehensive framework for metadata management, incorporating a data governance framework to classify sensitive information and assign ownership for organizational accountability. The platform distinguishes itself through AI-enabled data discovery, which connects large language models to a metadata graph to allow for natural language search and exploration of data assets. It also provides
Integrates large language models with a metadata graph to enable natural language search and discovery.
Python
عرض على GitHub12,106
datahub-project/datahub
datahub-project/datahub
12,141عرض على GitHub
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
Provides a centralized interface for users to search and access data assets, accelerating the time required to derive insights.
Pythondata-catalogdata-discoverydata-governance
عرض على GitHub12,141
microsoft/mastering-github-copilot-for-paired-programming
microsoft/Mastering-GitHub-Copilot-for-Paired-Programming
7,976عرض على GitHub
This project is a collection of educational resources and curricula designed for mastering AI pair programming and prompt engineering. It provides a structured training course and instructional materials for integrating AI assistants into the software development lifecycle. The materials cover the use of large language models to modernize legacy code and translate applications between programming languages. It includes a specific guide for crafting natural language queries to generate code and automate development workflows. The content addresses a broad range of capabilities, including AI-a
Provides workflows for using large language models to discover and explore data assets via natural language.
Pythoncopilotcsharpdotnet
عرض على GitHub7,976
amundsen-io/amundsen
amundsen-io/amundsen
4,737عرض على GitHub
Amundsen is a data catalog and discovery platform that provides a centralized directory for indexing tables and dashboards. It functions as a metadata management system and search engine, allowing users to locate and understand available data assets across diverse distributed sources. The platform includes capabilities for data lineage tracking to map the origin and movement of datasets between systems. It also serves as a data profiling tool, calculating distribution and quality statistics for individual table columns to provide automated insights into the nature of the data. The system man
Implements a searchable interface and index for locating specific datasets across diverse distributed data sources.
Pythonamundsendata-catalogdata-discovery
عرض على GitHub4,737
samapriya/awesome-gee-community-datasets
samapriya/awesome-gee-community-datasets
1,183عرض على GitHub
This project provides a curated catalog of community-contributed geospatial datasets designed for environmental analysis and mapping workflows. It functions as a centralized repository for discovering and retrieving geographic information, facilitating access to earth observation data without the need for manual preprocessing. Beyond its role as a data catalog, the project includes automation utilities for maintaining project documentation and monitoring repository health. It uses marker-based text injection to dynamically update documentation files and aggregates public engagement metrics, s
Functions as a searchable repository of geographic information systems data for environmental analysis.
HTMLawesome-listcatalogcommunity-catalog
عرض على GitHub1,183

Awesome Data Discovery Tools GitHub Repositories

eugeneyan/applied-ml

cube-js/cube

oxnr/awesome-bigdata

linkedin/datahub

datahub-project/datahub

microsoft/Mastering-GitHub-Copilot-for-Paired-Programming

amundsen-io/amundsen

samapriya/awesome-gee-community-datasets

استكشف الوسوم الفرعية