awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data Engineering and Infrastructure · Awesome GitHub Repositories

61 repos

Awesome GitHub RepositoriesData Engineering and Infrastructure

Foundational tools for large-scale data collection, ingestion, storage management, and reliability.

Explore 61 awesome GitHub repositories matching data & databases · Data Engineering and Infrastructure. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Engineering and Infrastructure

Awesome Data Engineering and Infrastructure GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • sindresorhus/awesome

    sindresorhus/awesome

    438,690GitHubView on GitHub↗

    This project is a community-curated knowledge base that organizes vast technical ecosystems into a hierarchical, human-readable directory. It serves as a comprehensive index of libraries, frameworks, and methodologies, designed to facilitate discovery and professional development across the entire spectrum of software

    awesomeawesome-listlists
  • vinta/awesome-python

    vinta/awesome-python

    283,687GitHubView on GitHub↗

    This project is a comprehensive, community-curated directory that organizes a vast landscape of Python software libraries, frameworks, and tools. It serves as a centralized knowledge base designed to facilitate ecosystem navigation and accelerate developer discovery across the entire software development lifecycle. Th

    Pythonawesomecollectionspython
  • torvalds/linux

    torvalds/linux

    217,986GitHubView on GitHub↗

    The Linux kernel is a monolithic operating system kernel that serves as the primary interface between computer hardware and software applications. It provides the foundational infrastructure for managing system resources, including memory allocation, process scheduling, and synchronization primitives. The project inclu

    C
  • openclaw/openclaw

    openclaw/openclaw

    211,971GitHubView on GitHub↗

    Openclaw is a platform for managing agent execution environments, providing the infrastructure to control agent lifecycles, session state, and workspace persistence. It features a centralized gateway that handles model loops, tool invocation, and streaming events, while supporting multi-agent routing and persistent mem

    TypeScriptaiassistantcrustacean
  • trimstray/the-book-of-secret-knowledge

    trimstray/the-book-of-secret-knowledge

    206,980GitHubView on GitHub↗

    This project serves as a centralized, community-driven repository of technical knowledge and administrative resources. It provides a structured taxonomy that aggregates disparate information into a searchable framework, supporting continuous learning and rapid problem-solving for system administrators and cybersecurity

    awesomeawesome-listbsd
  • tensorflow/tensorflow

    tensorflow/tensorflow

    193,864GitHubView on GitHub↗

    TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst

    C++deep-learningdeep-neural-networksdistributed
  • Significant-Gravitas/AutoGPT

    Significant-Gravitas/AutoGPT

    181,891GitHubView on GitHub↗

    AutoGPT is an orchestration platform designed for building, managing, and deploying autonomous agents. It provides a visual canvas-based environment where users can assemble agents by connecting modular blocks that represent actions, data flows, and conditional logic. The platform supports the entire agent lifecycle, i

    Pythonaiartificial-intelligenceautonomous-agents
  • jackfrued/Python-100-Days

    jackfrued/Python-100-Days

    178,734GitHubView on GitHub↗

    This project is a comprehensive, day-by-day curriculum designed to guide learners through the Python programming language and its professional applications. The content spans from fundamental syntax and object-oriented design to advanced topics including database management, web development, data analysis, and machine

    Jupyter Notebook
  • langchain-ai/langchain

    langchain-ai/langchain

    127,015GitHubView on GitHub↗

    LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows t

    Pythonagentsaiai-agents
  • kubernetes/kubernetes

    kubernetes/kubernetes

    120,673GitHubView on GitHub↗

    Kubernetes is a distributed container orchestration platform that automates the deployment, scaling, and management of containerized applications across clusters of computing nodes. It functions as a declarative infrastructure controller, utilizing a control loop architecture that continuously monitors the current syst

    Gocncfcontainersgo
  • excalidraw/excalidraw

    excalidraw/excalidraw

    117,138GitHubView on GitHub↗

    This project is a virtual whiteboard component and vector graphics editor designed for creating diagrams with a hand-drawn aesthetic. It provides a canvas-based drawing engine that can be embedded directly into web applications, allowing users to manipulate shapes, upload images, and export visual data into standard fo

    TypeScriptcanvascollaborationdiagrams
  • d3/d3

    d3/d3

    112,379GitHubView on GitHub↗

    D3 is a modular library providing low-level primitives for creating data-driven visualizations. It functions as a flexible framework that allows for direct control over visual presentation by mapping abstract data dimensions to graphical properties, such as position, color, and size, without imposing predefined chart a

    Shellchartchartsd3
  • papers-we-love/papers-we-love

    papers-we-love/papers-we-love

    103,417GitHubView on GitHub↗

    Papers We Love is a community-driven repository and learning network dedicated to the study and discussion of foundational computer science literature. It functions as a centralized educational archive, providing a structured environment where software professionals can engage with academic research to bridge the gap b

    Shellawesomecomputer-sciencemeetup
  • pytorch/pytorch

    pytorch/pytorch

    97,601GitHubView on GitHub↗

    PyTorch is a machine learning framework centered on a GPU-ready tensor library that supports multi-dimensional array operations across both CPU and accelerator hardware. It provides a foundational infrastructure for mathematical computation and dynamic neural network construction, utilizing a tape-based automatic diffe

    Pythonautograddeep-learninggpu
  • immich-app/immich

    immich-app/immich

    92,953GitHubView on GitHub↗

    Immich is a self-hosted media management platform designed to provide a centralized, private repository for photos and videos. It functions as a comprehensive system for organizing, backing up, and viewing personal media collections across mobile devices, web browsers, and external storage locations. By maintaining ful

    TypeScriptbackup-toolfluttergoogle-photos
  • ChatGPTNextWeb/NextChat

    ChatGPTNextWeb/NextChat

    87,317GitHubView on GitHub↗

    NextChat is a self-hosted web application that provides a unified interface for interacting with multiple large language models. It functions as a conversational platform where users can manage and switch between diverse AI providers through configurable API backends, maintaining full control over their data and infras

    TypeScriptcalclaudechatgptclaude
  • microsoft/markitdown

    microsoft/markitdown

    87,305GitHubView on GitHub↗

    This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine

    Pythonautogenautogen-extensionlangchain
  • firecrawl/firecrawl

    firecrawl/firecrawl

    84,034GitHubView on GitHub↗

    Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi

    TypeScriptaiai-agentsai-crawler
  • macrozheng/mall

    macrozheng/mall

    82,926GitHubView on GitHub↗

    This project is an enterprise-grade Java framework designed for building scalable, full-stack e-commerce applications. It provides a comprehensive foundation for microservice-based distributed architectures, enabling the development of complex retail platforms that include product management, order processing, and secu

    Javadockerelasticsearchelk
  • DopplerHQ/awesome-interview-questions

    DopplerHQ/awesome-interview-questions

    81,035GitHubView on GitHub↗

    This project is a comprehensive, community-sourced repository of technical interview questions and study materials. It serves as a centralized index for software engineers to prepare for technical assessments, benchmark their personal knowledge, and identify gaps in their expertise across a wide range of programming la

    android-interview-questionsangularjs-interview-questionsawesome
Prev1234Next

Explore sub-tags

  • Backup and Recovery Utilities4 sub-tagsUtilities for automating database dumps, file storage backups, and managing retention policies or recovery operations.
  • Caching and Performance2 sub-tagsTechniques and implementations focused on reducing latency and improving system throughput by storing frequently accessed data.
  • Data Engineering8 sub-tagsInfrastructure and frameworks used to build, manage, and scale complex systems for processing and analyzing large datasets.
  • Data Extraction & Ingestion11 sub-tags
Tools and processes for gathering, parsing, and importing raw data from various external sources into storage systems.
  • Data Persistence and Storage10 sub-tagsTechnologies and architectures dedicated to the durable storage and long-term management of digital information.