awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data & Databases · Awesome GitHub Repositories

168 repos

Awesome GitHub RepositoriesData & Databases

This category covers data storage, management, processing, analysis, and various database technologies and their operations.

Explore 168 awesome GitHub repositories matching data & databases · Data & Databases. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases

Awesome Data & Databases GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • Developer-Y/cs-video-courses

    Developer-Y/cs-video-courses

    74,064GitHubView on GitHub↗

    This project is a community-driven educational repository that serves as a comprehensive directory of university-level computer science video lectures. It provides a structured learning path for students and professionals, aggregating high-quality academic resources to facilitate self-paced study across a wide range of

    algorithmsbioinformaticscomputational-biology
  • infiniflow/ragflow

    infiniflow/ragflow

    73,425GitHubView on GitHub↗

    This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin

    Pythonagentagenticagentic-ai
  • redis/redis

    redis/redis

    73,096GitHubView on GitHub↗

    Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr

    Ccachecachingdatabase
  • awesomedata/awesome-public-datasets

    awesomedata/awesome-public-datasets

    72,846GitHubView on GitHub↗

    This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, t

    aaron-swartzawesome-public-datasetsdatasets
  • twitter/the-algorithm

    twitter/the-algorithm

    72,764GitHubView on GitHub↗

    The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver

    Scala
  • tesseract-ocr/tesseract

    tesseract-ocr/tesseract

    72,460GitHubView on GitHub↗

    Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d

    C++hacktoberfestlstmmachine-learning
  • lobehub/lobehub

    lobehub/lobehub

    72,403GitHubView on GitHub↗

    LobeHub is a comprehensive multi-agent orchestration platform designed for building, configuring, and deploying specialized AI agents. It provides a unified chat-based gateway that allows users to manage autonomous agent teams across web, desktop, and mobile environments. By utilizing a framework that supports persiste

    TypeScriptagentagent-collaborationagent-harness
  • grafana/grafana

    grafana/grafana

    72,295GitHubView on GitHub↗

    Grafana is an observability data platform designed to aggregate metrics, logs, and traces from diverse sources into a unified environment. It functions as a centralized interface for visualizing complex telemetry data, transforming raw streams into interactive dashboards that support real-time system health tracking an

    TypeScriptalertinganalyticsbusiness-intelligence
  • abi/screenshot-to-code

    abi/screenshot-to-code

    71,707GitHubView on GitHub↗

    This project is an artificial intelligence-powered frontend generator that translates visual design inputs into functional source code. It functions as a workflow engine that interprets graphical user interfaces, mapping layout structures and styling rules to structured markup and programming language syntax. The tool

    TypeScript
  • josephmisiti/awesome-machine-learning

    josephmisiti/awesome-machine-learning

    71,702GitHubView on GitHub↗

    This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco

    Python
  • pallets/flask

    pallets/flask

    71,240GitHubView on GitHub↗

    Flask is a micro web framework designed for building web services with a flexible, lightweight structure. It functions as a standard-compliant WSGI application server, providing the essential tools required to register URL routes, handle incoming HTTP requests, and construct responses. By utilizing a central applicatio

    Pythonflaskjinjapallets
  • protocolbuffers/protobuf

    protocolbuffers/protobuf

    70,695GitHubView on GitHub↗

    Protocol Buffers is a language-neutral, platform-agnostic mechanism for serializing structured data. It provides a schema-driven toolchain that compiles declarative data definitions into type-safe source code, enabling consistent communication and strongly typed API contracts across services written in different progra

    C++marshallingprotobufprotobuf-runtime
  • apache/superset

    apache/superset

    70,587GitHubView on GitHub↗

    Superset is a web-based business intelligence platform designed for data exploration, visualization, and interactive dashboarding. It functions as a query-driven analytics engine that connects to various SQL databases, allowing users to perform ad-hoc analysis, define virtual metrics, and build complex data visualizati

    TypeScriptanalyticsapacheapache-superset
  • dair-ai/Prompt-Engineering-Guide

    dair-ai/Prompt-Engineering-Guide

    70,526GitHubView on GitHub↗

    This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task

    MDXagentagentsai-agents
  • caddyserver/caddy

    caddyserver/caddy

    70,190GitHubView on GitHub↗

    Caddy is an extensible, modular web server platform designed for high-performance traffic management and automated security. At its core, it functions as a dynamic HTTP gateway that handles request routing, static asset delivery, and reverse proxying through a chain of configurable handler modules. The system is built

    Goacmeautomatic-httpscaddy
  • fffaraz/awesome-cpp

    fffaraz/awesome-cpp

    69,832GitHubView on GitHub↗

    This project is a comprehensive, curated directory of high-quality libraries, tools, and educational resources for C and C++ development. It serves as an ecosystem discovery index, helping developers navigate the vast landscape of third-party components, frameworks, and technical documentation available for the languag

    awesomeawesome-listc
  • Eugeny/tabby

    Eugeny/tabby

    68,976GitHubView on GitHub↗

    Tabby is a cross-platform terminal emulator and desktop application suite designed for managing command-line workflows and remote infrastructure. It provides a comprehensive environment for terminal session orchestration, allowing users to organize multiple active sessions through split panes and custom layouts. The ap

    TypeScriptserialssh-clienttelnet-client
  • danielmiessler/SecLists

    danielmiessler/SecLists

    68,943GitHubView on GitHub↗

    SecLists is a comprehensive repository of security testing assets, functioning as a centralized knowledge base and collection of wordlists for professionals conducting vulnerability assessments and penetration testing. It provides a vast array of usernames, passwords, and payloads designed for brute-force and fuzzing a

    PHP
  • binhnguyennus/awesome-scalability

    binhnguyennus/awesome-scalability

    68,707GitHubView on GitHub↗

    This project is a curated knowledge repository that aggregates high-quality resources, technical documentation, and expert insights focused on distributed systems engineering. It serves as a community-driven learning resource designed to help developers navigate the complexities of building and maintaining large-scale

    architectureawesomeawesome-list
  • AppFlowy-IO/AppFlowy

    AppFlowy-IO/AppFlowy

    68,167GitHubView on GitHub↗

    AppFlowy is a local-first knowledge base and collaborative workspace platform designed for structured information management. It functions as a modular productivity suite where users organize content through a block-based document model, allowing for flexible nesting and granular manipulation of data. The system priori

    Dartblogconfluence-alternativecontent-management
Prev1…456…9Next

Browse tags

  • API Data Management1 sub-tagMechanisms for filtering or selecting specific data fields returned by an application programming interface.
  • API Data Retrieval1 sub-tagTools and logic for managing how large datasets are requested and broken into manageable chunks from remote services.
  • API Layers1 sub-tagMiddleware and frameworks that provide an interface layer between client applications and underlying data sources.
  • Asynchronous Data Handling2 sub-tagsUtilities for managing non-blocking data operations and background tasks to maintain application responsiveness.
Automation Scripting APIs2 sub-tags
Programming interfaces designed to automate the manipulation and management of external system data and user records.
  • Cloud Storage Integrations1 sub-tagConnectors and drivers that enable applications to interact with remote object storage and cloud-based file systems.
  • Community Analytics1 sub-tagTools for measuring and visualizing the activity, engagement, and contributions of members within a community.
  • Community Data Platforms1 sub-tagPlatforms that centralize and synchronize data contributed by multiple users or sources into a unified repository.
  • Data Abstraction Layers5 sub-tagsSoftware layers that provide a unified interface for interacting with diverse storage backends and data structures.
  • Data Access Patterns1 sub-tagMethodologies and low-level techniques for reading from and writing to data storage systems efficiently.
  • Data Access and Querying8 sub-tagsInterfaces, query languages, and abstraction layers used to interact with and retrieve data from storage systems.
  • Data Analysis & Visualization12 sub-tagsThis group focuses on tools and techniques for analyzing, interpreting, and visually representing data.
  • Data Architectures2 sub-tagsStructural designs and organizational patterns for managing, partitioning, and modeling complex data systems.
  • Data Categories1 sub-tagCollections of structured information categorized by specific themes or temporal characteristics.
  • Data Collection2 sub-tagsSystems and automated processes designed to gather, harvest, and ingest information from external sources.
  • Data Collection Infrastructure1 sub-tagScalable frameworks and distributed systems built to support large-scale data gathering and web crawling operations.
  • Data Collections & Datasets12 sub-tagsThis group comprises various types of data collections and datasets, including domain-specific and open data.
  • Data Compression2 sub-tagsAlgorithms and utilities that reduce the size of data for efficient storage and transmission.
  • Data Consistency Models1 sub-tagFrameworks defining how data updates are propagated and synchronized across distributed nodes.
  • Data Containers1 sub-tagFoundational structures and base classes used to encapsulate and organize data for application use.
  • Data Conversion1 sub-tagUtilities for transforming data from one representation or encoding to another.
  • Data Deduplication2 sub-tagsTools that identify and remove redundant information to optimize storage space and data integrity.
  • Data Distribution Patterns1 sub-tagStandardized formats and protocols for sharing and distributing data across different systems and languages.
  • Data Domains1 sub-tagSpecialized datasets focused on specific industry sectors or subject matter areas.
  • Data Engineering and Infrastructure5 sub-tagsFoundational tools for large-scale data collection, ingestion, storage management, and reliability.
  • Data Engines1 sub-tagCore processing engines that manage data storage, retrieval, and synchronization, often optimized for local environments.
  • Data Export1 sub-tagTools for extracting and formatting data from internal systems for external use or archival.
  • Data Export Formats1 sub-tagSpecific file types and schemas used for outputting data, including specialized formats like OCR results.
  • Data Extensions1 sub-tagAdd-ons and plugins that extend the functionality of database systems to support bulk operations.
  • Data Filtering Strategies1 sub-tagLogic and rulesets for excluding or including specific data points based on defined criteria.
  • Data Filtering Utilities1 sub-tagFunctional utilities for processing and refining tabular data or lists based on user-defined filters.
  • Data Formatting1 sub-tagTools that transform raw data into human-readable formats or standardized visual representations.
  • Data Framing1 sub-tagMechanisms for structuring and delimiting data streams to ensure correct parsing during transmission.
  • Data Governance and Modeling6 sub-tagsFrameworks for defining schemas, ensuring standardization, and managing data assets and sovereignty.
  • Data Handling1 sub-tagGeneral-purpose libraries and tools for managing, serializing, and processing data within an application.
  • Data Inspection1 sub-tagUtilities for viewing, debugging, and formatting raw data for easier human analysis.
  • Data Integration & Synchronization12 sub-tagsThis group covers tools and strategies for integrating and synchronizing data across different systems.
  • Data Integration Architectures1 sub-tagFrameworks and patterns for moving and transforming data between disparate systems and storage environments.
  • Data Interoperability1 sub-tagStandards and protocols that enable different software systems to exchange and interpret shared data structures.
  • Data Management11 sub-tagsTools and utilities for maintaining, organizing, protecting, and migrating data throughout its operational lifecycle.
  • Data Management Interfaces1 sub-tagGraphical or programmatic interfaces designed for viewing, editing, and managing tabular data sets.
  • Data Operations1 sub-tagSystems and workflows focused on the routine maintenance and manipulation of individual data records.
  • Data Organization Tools1 sub-tagSoftware designed to categorize, index, and structure information for improved accessibility and retrieval.
  • Data Platforms3 sub-tagsComprehensive environments that provide specialized infrastructure for storing, analyzing, and monitoring specific types of data.
  • Data Preparation1 sub-tagTools that clean, format, and segment raw data to prepare it for downstream analysis or ingestion.
  • Data Processing Extensions1 sub-tagAdd-on components that enhance database functionality by performing specialized data cleaning or refinement tasks.
  • Data Processing Models1 sub-tagArchitectural approaches for processing data streams, such as handling information in discrete packets.
  • Data Processing Patterns1 sub-tagStandardized methods and techniques for converting data structures into formats suitable for storage or transmission.
  • Data Processing Pipelines18 sub-tagsSystems and workflows for ingesting, transforming, and orchestrating high-throughput data processing tasks.
  • Data Processing Services1 sub-tagManaged services that automate the delivery and ingestion of data from external sources.
  • Data Processing Utilities4 sub-tagsLibraries and algorithms used to perform specific data manipulation tasks like deduplication, streaming, or reduction.
  • Data Recovery Tools1 sub-tagSpecialized utilities designed to reconstruct or recover corrupted or misaligned data files.
  • Data Redundancy1 sub-tagTechniques and algorithms that ensure data availability and fault tolerance through redundant storage methods.
  • Data Resources1 sub-tagDatasets and reference materials used to support knowledge discovery and information research.
  • Data Serialization Formats8 sub-tagsLibraries and protocols that define how data is encoded, structured, and serialized for storage or network transport.
  • Data Sharing2 sub-tagsMechanisms that allow controlled access to data sets by sharing specific views or base data structures.
  • Data Stores1 sub-tagStorage systems engineered to maintain strict data consistency across distributed environments.
  • Data Synchronization Engines1 sub-tagEngines that maintain consistency between multiple data sources by propagating changes in real time.
  • Data Templating1 sub-tagTools for defining and applying patterns to format data, particularly for temporal or string-based values.
  • Data Transfer1 sub-tagInfrastructure components designed to move large volumes of data across network boundaries efficiently.
  • Database Access Patterns1 sub-tagStandardized methods for retrieving and iterating through database records using cursors or similar mechanisms.
  • Database Concepts3 sub-tagsFundamental principles and architectural components that define how databases operate, store, and manage data integrity.
  • Database Design Patterns2 sub-tagsBest practices for modeling data structures and enforcing attribute validation within database schemas.
  • Database Extensions1 sub-tagPlugins and add-ons that provide additional functionality or features for specific database management systems.
  • Database Infrastructure2 sub-tagsMiddleware and routing components that manage connections and traffic between applications and database clusters.
  • Database Management Systems8 sub-tagsCore engines, storage architectures, and operational configurations for persistent data management.
  • Database Resources1 sub-tagReference materials and documentation specifically focused on relational database systems.
  • Database Services1 sub-tagManaged cloud-based offerings that provide database hosting, maintenance, and operational support.
  • Dataset Management4 sub-tagsCollections of annotated media and structured data specifically curated for training and evaluating machine learning computer vision models.
  • Enterprise Data Platforms1 sub-tagCentralized systems that provide organizational access to large-scale data repositories and internal information discovery tools.
  • File Processing1 sub-tagTools designed to transform, convert, or manipulate the structure and format of digital files.
  • Geospatial Data & Services9 sub-tagsThis group includes services, tools, and data related to geographical information and location.
  • Graph Computing Systems3 sub-tagsTechnologies for modeling, processing, and analyzing data based on graph theory and relational connections.
  • Processor Utilities1 sub-tagSpecialized software components that perform specific data transformations based on the input media or data type.
  • Public Data APIs1 sub-tagInterfaces that provide programmatic access to publicly available datasets and government or institutional information services.
  • Public Welfare APIs1 sub-tagProgramming interfaces that facilitate access to data regarding social services, community support, and charitable initiatives.
  • SQL Development1 sub-tagSoftware environments and utilities that assist developers in writing, testing, and refining structured query language code.
  • Search and Indexing Technologies3 sub-tagsSpecialized tools for indexing, searching, and retrieving information across diverse data stores.
  • Storage Abstraction3 sub-tagsMiddleware layers that provide a unified interface for interacting with diverse underlying storage backends and hardware.
  • Storage Adapters2 sub-tagsSoftware connectors that enable applications to interface with specific cloud or local storage systems.
  • Storage Architectures2 sub-tagsStructural patterns and methodologies for organizing, indexing, and retrieving data within a storage system.
  • Storage Integrations1 sub-tagTools and utilities that connect storage systems to external authentication, security, or management workflows.
  • Storage Management Tools1 sub-tagAdministrative utilities that allow users to configure, monitor, and maintain storage resources via command-line interfaces.
  • Storage Services2 sub-tagsManaged infrastructure solutions that provide persistent storage capabilities for files and data objects.
  • Text Processing Utilities3 sub-tagsLibraries and tools specifically designed for extracting, inspecting, and manipulating textual data.
  • Vector EmbeddingsAlgorithms and services that convert unstructured data into numerical representations for machine learning applications.
  • Visual Data Management1 sub-tagInterfaces and dashboards designed to visualize, inspect, and manage complex data structures.