# awesomedata/awesome-public-datasets

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/awesomedata-awesome-public-datasets).**

75,979 stars · 11,521 forks · MIT

## Links

- GitHub: https://github.com/awesomedata/awesome-public-datasets
- Homepage: https://awesomedataworld.slack.com
- awesome-repositories: https://awesome-repositories.com/repository/awesomedata-awesome-public-datasets.md

## Topics

`aaron-swartz` `awesome-public-datasets` `datasets` `opendata`

## Description

This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, the repository facilitates the discovery of data necessary for exploratory analysis, machine learning model training, and the development of data-intensive applications.

The directory distinguishes itself through a lightweight, platform-agnostic approach to resource indexing that avoids the need for complex backend infrastructure. Content is organized using a topic-centric hierarchical taxonomy, which simplifies navigation across diverse domains ranging from climate science and economics to healthcare and computer networks. This structure is maintained through a collaborative, community-driven model where peer review and version-controlled updates ensure the ongoing accuracy and relevance of the curated links.

The collection covers a broad capability surface, including specialized datasets for fields such as physics, geographic information systems, natural language processing, and time-series analysis. The repository is documented entirely through human-readable markdown files, allowing for transparent contributions and easy access to its comprehensive index of public information.

## Tags

### Repository Format

- [Awesome List](https://awesome-repositories.com/f/repository-format/awesome-list.md) — A community-curated directory that catalogs and links out to other open-source projects, rather than a standalone tool you run yourself.

### Artificial Intelligence & ML

- [Model Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines.md) — Supplies a diverse collection of labeled datasets essential for training, validating, and benchmarking predictive models.

### Data & Databases

- [Public Datasets](https://awesome-repositories.com/f/data-databases/data-engineering/public-datasets.md) — Aggregates high-quality, open-access datasets to help developers populate prototypes and test data-intensive applications.
- [Knowledge Discovery Resources](https://awesome-repositories.com/f/data-databases/data-collections-datasets/knowledge-discovery-resources.md) — Acts as a centralized reference point for locating domain-specific datasets across government, scientific, and technological sectors.
- [Static Resource Directories](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/domain-specific-data-discovery/static-resource-directories.md) — Organizes external data assets into a human-readable, searchable format that remains platform-agnostic.

### Development Tools & Productivity

- [Curated Resource Lists](https://awesome-repositories.com/f/development-tools-productivity/documentation-discovery-metadata/developer-discovery-platforms/developer-discovery-portals/curated-resource-lists.md) — Curates a topic-centric list of open datasets specifically for research and development workflows. ([source](https://github.com/awesomedata/awesome-public-datasets))

### Software Engineering & Architecture

- [Curated Data Repositories](https://awesome-repositories.com/f/software-engineering-architecture/project-management-governance/repository-maintenance/repository-identities/curated-data-repositories.md) — Maintains a community-vetted collection of open-access data, structured by domain to support research efforts.
- [Project Governance](https://awesome-repositories.com/f/software-engineering-architecture/project-management-governance/project-governance.md) — Utilizes distributed peer review and pull requests to maintain the accuracy and relevance of curated external links.

### Scientific & Mathematical Computing

- [Data Science Research Resources](https://awesome-repositories.com/f/scientific-mathematical-computing/research-analysis-workflows/data-science-research-resources.md) — Offers a wide array of public data sources for performing exploratory analysis and testing scientific hypotheses.

### Part of an Awesome List

- [Curated Research Lists](https://awesome-repositories.com/f/awesome-lists/ai/curated-research-lists.md) — Large-scale datasets for various machine learning tasks.
- [Big Data](https://awesome-repositories.com/f/awesome-lists/data/big-data.md) — Listed in the “Big Data” section of the Awesome awesome list.
- [Data Analytics](https://awesome-repositories.com/f/awesome-lists/data/data-analytics.md) — Repository of open datasets for research and analysis.
- [Data Engineering](https://awesome-repositories.com/f/awesome-lists/data/data-engineering.md) — Curated collections of open public data.
- [Databases & Data](https://awesome-repositories.com/f/awesome-lists/data/databases-data.md) — Large-scale public data repositories.
- [Geospatial Data Sources](https://awesome-repositories.com/f/awesome-lists/data/geospatial-data-sources.md) — Curated list of open datasets across various domains.
- [Neuroscience Data](https://awesome-repositories.com/f/awesome-lists/data/neuroscience-data.md) — Collection of high-quality open neuroscience datasets.
- [Public Data APIs](https://awesome-repositories.com/f/awesome-lists/data/public-data-apis.md) — Curated list of open data sources.
- [Curated Knowledge Bases](https://awesome-repositories.com/f/awesome-lists/learning/curated-knowledge-bases.md) — A directory of publicly available data sources.
- [Curated Resource Lists](https://awesome-repositories.com/f/awesome-lists/learning/curated-resource-lists.md) — Directory of high-quality public data sources.
- [Educational Resources](https://awesome-repositories.com/f/awesome-lists/learning/educational-resources.md) — Publicly available datasets for data-driven art.
- [Curated Lists](https://awesome-repositories.com/f/awesome-lists/more/curated-lists.md) — Listed in the “Curated Lists” section of the The Book Of Secret Knowledge awesome list.
- [Curated Resource Lists](https://awesome-repositories.com/f/awesome-lists/more/curated-resource-lists.md) — A directory of open and publicly available datasets.
- [Public Data Sources](https://awesome-repositories.com/f/awesome-lists/more/public-data-sources.md) — Curated list of public datasets across various research domains.
- [Related Awesome Lists](https://awesome-repositories.com/f/awesome-lists/more/related-awesome-lists.md) — Curated list of public datasets.
- [Specifications](https://awesome-repositories.com/f/awesome-lists/more/specifications.md) — Listed in the “Specifications” section of the Awesome Arcgis Developers awesome list.

### Content Management & Publishing

- [Markdown and Markup Tools](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/markdown-markup-tools.md) — Employs human-readable text files to simplify community contributions and version-controlled updates.

### Game Development

- [Physics Engines](https://awesome-repositories.com/f/game-development/physics-engines.md) — Lists datasets containing physical measurements and simulation parameters for scientific modeling. ([source](https://github.com/awesomedata/awesome-public-datasets))
