What are the best Awesome Data Transformation GitHub Repositories?

Tools and utilities for modifying, restructuring, or converting raw data into desired formats and schemas. Explore 348 awesome GitHub repositories matching data & databases · Data Transformation. Refine with filters or upvote what's useful. Top picks: jwasham/coding-interview-university, thealgorithms/python, vuejs/vue, tensorflow/tensorflow, n8n-io/n8n, significant-gravitas/autogpt, avelino/awesome-go, yt-dlp/yt-dlp, langchain-ai/langchain, firecrawl/firecrawl.

Why is jwasham/coding-interview-university a recommended Data Transformation GitHub Repositories repository?

Reduces data footprint using encoding algorithms to enhance storage efficiency and transmission performance.

Why is thealgorithms/python a recommended Data Transformation GitHub Repositories repository?

Shrink digital information streams through encoding techniques to improve storage density and transmission speeds.

Why is vuejs/vue a recommended Data Transformation GitHub Repositories repository?

Renders filtered or sorted data sets using computed properties without modifying the original source.

Why is tensorflow/tensorflow a recommended Data Transformation GitHub Repositories repository?

Applies optimized routines to perform element-wise operations and shape manipulations on multi-dimensional data structures.

Why is n8n-io/n8n a recommended Data Transformation GitHub Repositories repository?

Eliminates redundant entries within data streams to maintain unique event records throughout automated sequences.

Why is significant-gravitas/autogpt a recommended Data Transformation GitHub Repositories repository?

Transforms unstructured keyword objects into structured, typed fields for metric analysis.

Why is avelino/awesome-go a recommended Data Transformation GitHub Repositories repository?

Streamlines reactive programming and data stream transformations using specialized toolkits.

Why is yt-dlp/yt-dlp a recommended Data Transformation GitHub Repositories repository?

Evaluates stream metadata against defined criteria to transform and restructure raw media into desired file formats.

Why is langchain-ai/langchain a recommended Data Transformation GitHub Repositories repository?

Process diverse binary and multimodal data types through unified interfaces designed for complex AI pipelines.

Why is firecrawl/firecrawl a recommended Data Transformation GitHub Repositories repository?

Leverages language models to intelligently parse and convert raw HTML into clean, semantic data structures.

348 Repos

Awesome GitHub RepositoriesData Transformation

Tools and utilities for modifying, restructuring, or converting raw data into desired formats and schemas.

Explore 348 awesome GitHub repositories matching data & databases · Data Transformation. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

jwasham/coding-interview-university
jwasham/coding-interview-university
353,639Auf GitHub ansehen
Dieses Projekt ist ein umfassender Bildungs-Lehrplan, der Softwareingenieure durch die Beherrschung der Informatik-Grundlagen und die Vorbereitung auf technische Vorstellungsgespräche führen soll. Er bietet einen strukturierten, abhängigkeitsbewussten Lernpfad, der komplexe Informatikkonzepte in einen hierarchischen Lehrplan organisiert und es Nutzern ermöglicht, durch iteratives Studium und praktische Implementierung ein professionelles Engineering-Fundament aufzubauen. Der Lehrplan zeichnet sich durch die Integration von theoretischem Wissen mit beruflicher Entwicklung aus und bietet einen einheitlichen Index von querverweisenden Ressourcen, einschließlich Büchern, wissenschaftlichen Arbeiten und Video-Tutorials. Er betont die Standardisierung der algorithmischen Effizienz durch asymptotische Komplexitätsanalyse und bietet eine granulare, modulare Themenzerlegung, um fokussiertes, inkrementelles Lernen über weite technische Bereiche hinweg zu erleichtern. Neben Kernalgorithmen und Datenstrukturen deckt das Repository ein breites Spektrum ab, einschließlich Systemarchitektur-Design, verteilten Systemen, Computersicherheit und fortgeschrittener mathematischer Modellierung. Es bietet zudem strategische Beratung für den gesamten Einstellungsprozess, von der Lebenslaufoptimierung und der Vorbereitung auf verhaltensbezogene Interviews bis hin zum langfristigen Karrierewachstum. Die gesamte Wissensdatenbank wird als versionskontrolliertes, Markdown-gesteuertes Repository gepflegt, was einen plattformunabhängigen und kollaborativen Ansatz für die technische Bildung ermöglicht.
Reduces data footprint using encoding algorithms to enhance storage efficiency and transmission performance.
algorithmalgorithmscoding-interview
Auf GitHub ansehen353,639
thealgorithms/python
TheAlgorithms/Python
221,992Auf GitHub ansehen
Dieses Projekt ist ein umfassendes Repository verifizierter Rechenimplementierungen, das als Bildungsressource für Informatik und algorithmische Problemlösung dienen soll. Es bietet eine strukturierte Sammlung von Codebeispielen, die grundlegende Datenstrukturen, mathematische Operationen und Kernkonzepte der Programmierung abdecken und es Nutzern ermöglichen, die Logik und Komplexität hinter verschiedenen Berechnungsmethoden zu studieren. Das Repository zeichnet sich durch ein modulares, referenzbasiertes Implementierungsmuster aus, das Code in logische Namespaces organisiert. Dieser Ansatz erleichtert die unabhängige Ausführung und pädagogische Klarheit und ermöglicht es Nutzern, die Entwicklung von Berechnungsstrategien von naiven Brute-Force-Ansätzen bis hin zu optimierten Hochleistungslösungen zu erforschen. Durch die Entkopplung von Datenstruktur-Abstraktionen von algorithmischen Operationen stellt das Projekt sicher, dass Implementierungen austauschbar und leicht zu analysieren bleiben. Das Fähigkeitsspektrum umfasst eine breite Palette technischer Bereiche, einschließlich maschinellem Lernen, Kryptographie, wissenschaftlichem Rechnen und Computer Vision. Es enthält Implementierungen für prädiktive Modellierung, neuronale Netze und statistische Analysen, neben Tools für digitale Signalverarbeitung, Netzwerkflussmanagement und Finanzmodellierung. Die Sammlung adressiert zudem spezialisierte mathematische Bedürfnisse, wie lineare Algebra, geometrische Berechnungen und Bit-Manipulation, und bietet eine breite Grundlage für Forschung und Engineering-Anwendungen.
Shrink digital information streams through encoding techniques to improve storage density and transmission speeds.
Pythonalgorithmalgorithm-competitionsalgorithms-implemented
Auf GitHub ansehen221,992
vuejs/vue
vuejs/vue
209,900Auf GitHub ansehen
Vue ist ein progressives, komponentenbasiertes JavaScript-Framework, das für den Aufbau reaktiver Benutzeroberflächen und Single-Page-Anwendungen entwickelt wurde. Es konzentriert sich auf ein deklaratives Vorlagensystem, das HTML in effiziente Render-Funktionen umwandelt und es Entwicklern ermöglicht, komplexe Schnittstellen in isolierte, wiederverwendbare Einheiten zu organisieren, die automatisch mit dem Anwendungszustand synchronisieren. Das Framework zeichnet sich durch ein reaktivitätsbasiertes Abhängigkeitsverfolgungssystem aus, das den Datenzugriff während des Renderns überwacht, um präzise Updates auszulösen. Es bietet eine flexible Architektur, die sowohl die inkrementelle Einführung als auch die Entwicklung von Anwendungen in vollem Umfang unterstützt. Entwickler können ein robustes, Plugin-basiertes Erweiterbarkeitsmodell nutzen, um globale Logik zu injizieren, während die virtuelle DOM-Abgleichung des Frameworks effiziente Schnittstellen-Updates durch die Berechnung minimaler Mutationen sicherstellt. Über seine Kern-Rendering-Fähigkeiten hinaus enthält das Projekt eine umfassende Suite von Tools zur Verwaltung des Anwendungszustands, URL-basiertem Routing und serverseitigem Rendering. Es bietet umfassende Unterstützung für Komponentenkomposition, Inhaltsverteilung und Animationsmanagement, neben integrierten Sicherheitsmaßnahmen wie automatischem Content-Escaping, um häufige Schwachstellen zu verhindern. Das Framework wird mit offiziellen Typdeklarationen vertrieben, um die statische Analyse zu unterstützen, und kann über Standard-Paketmanager installiert oder direkt über Skript-Tags in Browserumgebungen integriert werden.
Renders filtered or sorted data sets using computed properties without modifying the original source.
TypeScriptframeworkfrontendjavascript
Auf GitHub ansehen209,900
tensorflow/tensorflow
tensorflow/tensorflow
195,697Auf GitHub ansehen
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The system provides high-level interfaces for defining neural network architectures, alongside a robust engine for managing multidimensional array structures and tensor mathematics. The framework distinguishes itself through a scalable distributed runtime that orchestrates workloads acr
Applies optimized routines to perform element-wise operations and shape manipulations on multi-dimensional data structures.
C++deep-learningdeep-neural-networksdistributed
Auf GitHub ansehen195,697
n8n-io/n8n
n8n-io/n8n
192,772Auf GitHub ansehen
n8n is a workflow automation platform that combines a visual interface with code-based extensibility to design, orchestrate, and manage automated processes. It provides a comprehensive suite of tools for data transformation, filtering, and storage, allowing users to build complex logic through conditional branching, looping, and sub-workflow execution. The platform supports both pre-built integration nodes and custom code execution in JavaScript or Python, enabling connectivity with a wide range of external services and APIs. The platform includes a suite of generative AI capabilities, such a
Eliminates redundant entries within data streams to maintain unique event records throughout automated sequences.
TypeScriptaiapisautomation
Auf GitHub ansehen192,772
significant-gravitas/autogpt
Significant-Gravitas/AutoGPT
184,973Auf GitHub ansehen
AutoGPT is an orchestration platform designed for building, managing, and deploying autonomous agents. It provides a visual canvas-based environment where users can assemble agents by connecting modular blocks that represent actions, data flows, and conditional logic. The platform supports the entire agent lifecycle, including task scheduling, execution monitoring, and configuration management, while offering a marketplace for discovering and sharing community-built workflows. The project includes a legacy framework for command-line agent execution and an extensible component system for devel
Transforms unstructured keyword objects into structured, typed fields for metric analysis.
Pythonaiartificial-intelligenceautonomous-agents
Auf GitHub ansehen184,973
avelino/awesome-go
avelino/awesome-go
175,576Auf GitHub ansehen
This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently discover resources tailored to specific functional domains. The repository distinguishes itself through a decentralized contribution model, where community-driven updates ensure the index remains current with the rapidly evolving software landscape. Beyond simple resource listing,
Streamlines reactive programming and data stream transformations using specialized toolkits.
Goawesomeawesome-listgo
Auf GitHub ansehen175,576
yt-dlp/yt-dlp
yt-dlp/yt-dlp
170,963Auf GitHub ansehen
This project is a command-line media downloader designed for the systematic retrieval and organization of digital content from diverse online platforms. It functions as an extensible extraction engine that utilizes a declarative format-selection pipeline to automate the identification, merging, and downloading of specific audio and video streams based on user-defined criteria. The system distinguishes itself through a modular architecture that supports custom plugins and site-specific scripts, allowing for the bypass of platform restrictions and the handling of complex authentication challeng
Evaluates stream metadata against defined criteria to transform and restructure raw media into desired file formats.
Pythonclidownloaderpython
Auf GitHub ansehen170,963
langchain-ai/langchain
langchain-ai/langchain
139,458Auf GitHub ansehen
LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows that manage state, memory, and tool execution. The project distinguishes itself through a durable execution runtime that maintains persistent state across long-running processes by checkpointing progress to external storage. It models agent workflows as directed graphs, allowing
Process diverse binary and multimodal data types through unified interfaces designed for complex AI pipelines.
Pythonagentsaiai-agents
Auf GitHub ansehen139,458
firecrawl/firecrawl
firecrawl/firecrawl
133,479Auf GitHub ansehen
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture. The platform distinguishes itself through its focus on agentic workflows, providing a programmatic interface that allows autonomous agents to perform live
Leverages language models to intelligently parse and convert raw HTML into clean, semantic data structures.
TypeScriptaiai-agentsai-crawler
Auf GitHub ansehen133,479
iptv-org/iptv
iptv-org/iptv
127,909Auf GitHub ansehen
This project is a community-maintained, open-source repository that functions as a centralized directory for streaming metadata. It aggregates publicly available network stream links and organizes them into standardized, machine-readable playlist formats. By acting strictly as a metadata-only index, the platform enables users to access and organize live broadcast content across various third-party media playback applications without hosting or distributing any actual video files. The repository distinguishes itself through a collaborative, crowdsourced workflow where contributors actively mai
Merges distributed community updates into a unified, structured dataset of verified streaming links.
TypeScriptiptvm3uplaylist
Auf GitHub ansehen127,909
d3/d3
d3/d3
113,118Auf GitHub ansehen
D3 is a modular library providing low-level primitives for creating data-driven visualizations. It functions as a flexible framework that allows for direct control over visual presentation by mapping abstract data dimensions to graphical properties, such as position, color, and size, without imposing predefined chart abstractions. The library distinguishes itself by offering specialized tools for complex data representation, including algorithmic layouts for hierarchical structures and geographic projection utilities for mapping spherical coordinates. It also includes a comprehensive suite fo
Comprehensive utilities handle the ordering, searching, summarizing, binning, and grouping of complex data sets.
Shellchartchartsd3
Auf GitHub ansehen113,118
godotengine/godot
godotengine/godot
112,618Auf GitHub ansehen
Godot is a comprehensive, node-based game engine designed for building interactive 2D and 3D applications. It provides an integrated development environment that utilizes a hierarchical scene system to organize objects, propagate spatial transformations, and manage lifecycle events. The engine functions as a cross-platform development suite, allowing developers to author, test, and export software to desktop, mobile, and web environments from a single, unified codebase. The engine distinguishes itself through a modular, component-based architecture that relies on signals-based decoupling for
Implements native data types for vectors, transforms, and arrays to enable high-performance mathematical operations.
C++game-developmentgame-enginegamedev
Auf GitHub ansehen112,618
axios/axios
axios/axios
109,077Auf GitHub ansehen
Axios is an isomorphic, promise-based HTTP client designed for making asynchronous network requests across different JavaScript execution environments, including the browser and Node.js. It functions as a JSON API client that serializes JavaScript objects into JSON and parses server responses into structured data. The project features a system for managing reusable client instances with shared configurations, such as base URLs and default settings. It includes a mechanism for intercepting outgoing requests and incoming responses globally, allowing data to be transformed before it reaches the
Converts JavaScript objects into JSON, multipart, or URL-encoded formats for network transmission.
JavaScripthacktoberfesthttp-clientjavascript
Auf GitHub ansehen109,077
growinggit/github-chinese-top-charts
GrowingGit/GitHub-Chinese-Top-Charts
108,509Auf GitHub ansehen
This project functions as a curated software directory and developer resource index, providing a centralized platform for discovering and evaluating high-quality open-source repositories. It serves as an aggregator that monitors trending software and educational resources, organizing them by technical domain and programming language to assist developers in identifying tools for their specific technical challenges. The directory distinguishes itself through a community-driven curation workflow, where repository lists are validated and updated based on collective developer consensus. This infor
Retrieves real-time repository metadata and contributor statistics directly from remote service endpoints.
Java
Auf GitHub ansehen108,509
browser-use/browser-use
browser-use/browser-use
100,229Auf GitHub ansehen
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows without relying on brittle selectors. The system functions as a headless browser controller, providing a programmatic interface to manage browser instances and execute granular interactions. The project distinguishes itself through its ability to translate high-level intent into
Extracts structured information from complex web pages by parsing raw HTML elements into defined, machine-readable data schemas.
Pythonai-agentsai-toolsbrowser-automation
Auf GitHub ansehen100,229
oven-sh/bun
oven-sh/bun
93,257Auf GitHub ansehen
Bun is a high-performance runtime environment designed to execute JavaScript and TypeScript applications with minimal latency and high throughput. Built on a native core implemented in Zig, it provides a unified execution engine that leverages JavaScriptCore for efficient memory management and low-latency startup. The project functions as an all-in-one toolchain, integrating a native bundler, transpiler, package manager, and test runner into a single command-line interface. What distinguishes Bun is its focus on native system integration and developer productivity. It features a high-performa
Converts blobs into readable streams to consume binary data asynchronously using standard stream-based processing patterns.
Rustbunbundlerjavascript
Auf GitHub ansehen93,257
junegunn/fzf
junegunn/fzf
81,017Auf GitHub ansehen
This project is a general-purpose command-line filter that provides an interactive interface for processing standard input streams. It enables real-time fuzzy searching, data selection, and transformation, allowing users to navigate complex information or file systems directly within their terminal. By utilizing a pipe-oriented architecture, it integrates into existing shell pipelines and workflows to facilitate efficient data exploration. What distinguishes this tool is its highly extensible, event-driven design that allows for deep integration with external processes. It supports asynchrono
High-performance filtering logic manages candidate lists to execute complex matching syntax on incoming data streams.
Gobashclifish
Auf GitHub ansehen81,017
anuraghazra/github-readme-stats
anuraghazra/github-readme-stats
79,661Auf GitHub ansehen
This project is a serverless service that generates dynamic, themeable visual summaries of software development activity. It functions as an automated metadata visualizer, transforming raw platform logs and repository metrics into resolution-independent vector graphics that can be embedded directly into markdown environments. The service distinguishes itself by offering highly configurable, query-parameter-driven rendering that allows users to customize the visual presentation of their coding patterns, language proficiency, and repository details. It supports both real-time generation via ser
Merges disparate data points from multiple remote endpoints into a unified schema before rendering the final visual output.
JavaScriptdynamicprofile-readmereadme-generator
Auf GitHub ansehen79,661
hoppscotch/hoppscotch
hoppscotch/hoppscotch
79,618Auf GitHub ansehen
Hoppscotch is an open-source API development ecosystem designed for building, testing, and debugging REST, GraphQL, and real-time APIs. It provides a unified platform that functions across web browsers, desktop applications, and command-line interfaces, allowing developers to manage the entire API lifecycle from a single environment. The platform distinguishes itself through a highly interactive, command-driven interface that utilizes a global spotlight palette and keyboard shortcuts to streamline complex workflows. It supports advanced request manipulation and validation by executing JavaScr
Normalizes diverse external API definitions into a standardized internal format for consistent processing.
TypeScriptapiapi-clientapi-rest
Auf GitHub ansehen79,618