8 Repos
Techniques and processes for cleaning, transforming, and analyzing raw datasets to derive insights.
Distinct from Python Code Analysis Libraries: The candidates focused on code analysis or specific libraries; this is about the domain of data analysis workflows.
Explore 8 awesome GitHub repositories matching data & databases · Data Analysis Workflows. Refine with filters or upvote what's useful.
This repository is a comprehensive collection of instructional guides and practical examples for Python development, focusing on machine learning, data science, and web scraping. It provides implementations for neural networks, reinforcement learning algorithms, and deep learning architectures using PyTorch, alongside detailed manuals for scientific computing and data visualization. The project distinguishes itself by offering specialized tutorials on concurrent programming to optimize CPU performance and guides for setting up Linux development environments. It covers the implementation of ad
Implements end-to-end workflows for cleaning, transforming, and analyzing tabular datasets.
This project is a Python education repository and programming tutorial designed to teach language fundamentals, from basic syntax and variables to advanced concepts. It serves as a data science starter kit and a guide for REST API integration. The repository provides instructional scripts and sample code covering object-oriented programming patterns and asynchronous programming. It includes practical demonstrations for fetching and processing JSON data from external web services using HTTP requests. The materials cover a broad capability surface including data analysis workflows with interac
Provides a workflow for cleaning, transforming, and analyzing raw datasets using interactive notebooks.
This project is a collection of educational notes and tutorials focused on Python programming, scientific computing, and data analysis. It serves as a reference for learning language basics, advanced techniques, and object-oriented design. The materials include implementation guides for building linear, logistic, and convolutional neural networks using symbolic graph frameworks. It also provides instruction on manipulating and visualizing structured data frames and performing complex mathematical operations through numerical libraries. The repository includes a system for converting interact
Provides a workflow for manipulating and visualizing structured data frames to uncover insights.
dlt ist ein Python-Tool zur Datenaufnahme und ein ETL-Pipeline-Framework, das darauf ausgelegt ist, Daten aus verschiedenen Quellen abzurufen und in strukturierten Zielen zu speichern. Es fungiert als Schema-Inferenz-Engine, die automatisch Datentypen erkennt und verschachtelte JSON-Strukturen in relationale Tabellen flacht, wobei Daten von Quellen in Lakehouses, Warehouses oder Vektordatenbanken verschoben werden. Das Projekt zeichnet sich durch KI-gestützte Pipeline-Generierung aus, die Large Language Models nutzt, um Extraktionscode und Konnektoren für REST-APIs zu erstellen. Es unterstützt zudem multimodale Vektorspeicherung und die spezialisierte Befüllung von Vektordatenbanken zur Unterstützung von KI- und Machine-Learning-Anwendungen. Das Framework deckt ein breites Spektrum an Funktionen ab, einschließlich automatisierter Schema-Evolution, inkrementellem Datenladen mittels Statusverfolgung und Datenqualitätsvalidierung durch die Durchsetzung von Datenverträgen. Es bietet Tools für relationale Datennormalisierung, Pre- und Post-Load-Transformationen sowie eine Vielzahl von Ziel-Adaptern für SQL-Datenbanken und Cloud-Objektspeicher. Die Observability wird durch Pipeline-Ausführungs-Dashboards, Spalten-Lineage-Tracking und Schema-Versionsverifizierung mittels inhaltsbasierter Hashes gehandhabt.
Profiles tables and plans charts using query code to uncover trends within a pipeline.
Dieses Projekt ist eine Sammlung von Big-Data-Frameworks und Pipelines, darunter ein Apache Hive-Analyse-Framework, eine Plattform für Verhaltensdatenanalyse, eine Predictive-Analytics-Engine und Echtzeit-Datenpipelines. Es bietet die Infrastruktur für den Aufbau von ETL-Workflows (Extract, Transform, Load), um große Datensätze für verteilte Speicherung und SQL-basierte Analysen zu verarbeiten. Das System unterstützt diverse analytische Implementierungen, wie eine Predictive-Engine mittels linearer Regression für Prognosen und eine Echtzeit-Architektur, die Daten über Message-Broker für sofortiges Reporting weiterleitet. Es enthält spezialisierte Funktionen für die Analyse von Nutzerverhalten, E-Commerce-Performance-Messungen und Daten des städtischen Nahverkehrs. Die Codebasis deckt ein breites Spektrum an Data Engineering und Analyse ab, einschließlich Datenbereinigung und -transformation, verteilter Datenaufnahme (Ingestion), fensterbasierter Stream-Verarbeitung und der Visualisierung von Ergebnissen durch Business-Intelligence-Tools. Zudem ermöglicht es die Berechnung spezifischer Geschäftskennzahlen wie Konversionsraten, Monetarisierungs-Performance und Nutzer-Engagement-Level.
Provides comprehensive workflows for cleaning, transforming, and querying large datasets to extract business insights.
This project is a comprehensive collection of Python programming education materials, including tutorials, exercises, and curated code samples. It serves as a learning curriculum and software engineering toolkit, utilizing Jupyter Notebooks to combine executable code with descriptive educational text. The repository provides practical implementation guides for building large language model applications, such as retrieval-augmented generation systems, stateful AI agents, and machine learning workflows. It distinguishes itself by offering a structured approach to agentic coding workflows, cover
Provides structured workflows for cleaning and analyzing raw datasets to derive statistical insights.
This project is a structured data science curriculum and Python-based textbook designed to teach the fundamentals of data science through executable scripts and hands-on lessons. It functions as a guided programming tutorial for data manipulation and analysis within the Python ecosystem. The content covers introductory machine learning, including the implementation of basic models and algorithms, alongside Python data analysis for cleaning and processing datasets. The material is delivered via Jupyter Notebooks, combining modular exercises and markdown-driven documentation to map theoretical
Demonstrates how to use Python libraries to clean, process, and analyze datasets.
This is a comprehensive Python programming course and technical curriculum designed to take users from foundational syntax to advanced development patterns. It serves as a multi-disciplinary educational suite covering programming fundamentals, object-oriented design, and data analysis. The project provides specialized guides on professional development techniques, including the use of decorators, generators for memory management, and dunder-method operator overloading. It also includes instructional material on executing parallel tasks through concurrency and multiprocessing to reduce executi
Teaches the entire workflow of cleaning, transforming, and analyzing raw datasets to derive insights.