8 个仓库
Techniques and processes for cleaning, transforming, and analyzing raw datasets to derive insights.
Distinct from Python Code Analysis Libraries: The candidates focused on code analysis or specific libraries; this is about the domain of data analysis workflows.
Explore 8 awesome GitHub repositories matching data & databases · Data Analysis Workflows. Refine with filters or upvote what's useful.
This repository is a comprehensive collection of instructional guides and practical examples for Python development, focusing on machine learning, data science, and web scraping. It provides implementations for neural networks, reinforcement learning algorithms, and deep learning architectures using PyTorch, alongside detailed manuals for scientific computing and data visualization. The project distinguishes itself by offering specialized tutorials on concurrent programming to optimize CPU performance and guides for setting up Linux development environments. It covers the implementation of ad
Implements end-to-end workflows for cleaning, transforming, and analyzing tabular datasets.
This project is a Python education repository and programming tutorial designed to teach language fundamentals, from basic syntax and variables to advanced concepts. It serves as a data science starter kit and a guide for REST API integration. The repository provides instructional scripts and sample code covering object-oriented programming patterns and asynchronous programming. It includes practical demonstrations for fetching and processing JSON data from external web services using HTTP requests. The materials cover a broad capability surface including data analysis workflows with interac
Provides a workflow for cleaning, transforming, and analyzing raw datasets using interactive notebooks.
This project is a collection of educational notes and tutorials focused on Python programming, scientific computing, and data analysis. It serves as a reference for learning language basics, advanced techniques, and object-oriented design. The materials include implementation guides for building linear, logistic, and convolutional neural networks using symbolic graph frameworks. It also provides instruction on manipulating and visualizing structured data frames and performing complex mathematical operations through numerical libraries. The repository includes a system for converting interact
Provides a workflow for manipulating and visualizing structured data frames to uncover insights.
dlt 是一个 Python 数据摄取工具和 ETL 流水线框架,旨在从不同来源获取数据并将其持久化到结构化目标中。它作为一个模式推断引擎,可自动检测数据类型并将嵌套的 JSON 结构扁平化为关系表,将数据从源端移动到数据湖、数据仓库或向量数据库。 该项目通过 AI 驱动的流水线生成脱颖而出,利用大语言模型为 REST API 构建提取代码和连接器。它还支持多模态向量存储和向量数据库的专门填充,以支持 AI 和机器学习应用。 该框架涵盖了广泛的功能,包括自动化模式演进、通过状态跟踪进行增量数据加载,以及通过强制执行数据契约进行数据质量验证。它提供了用于关系数据规范化、加载前后转换的工具,以及针对 SQL 数据库和云对象存储的多种目标适配器。 可观测性通过流水线执行仪表板、列血缘跟踪以及使用基于内容的哈希进行模式版本验证来处理。
Profiles tables and plans charts using query code to uncover trends within a pipeline.
This project is a collection of big data frameworks and pipelines, including an Apache Hive analysis framework, a behavioral data analytics platform, a predictive analytics engine, and real-time data pipelines. It provides the infrastructure for building Extract, Transform, Load (ETL) workflows to process large datasets for distributed storage and SQL-based analysis. The system supports diverse analytical implementations, such as a predictive engine using linear regression for value forecasting and a real-time architecture that moves data through message brokers for immediate reporting. It in
Provides comprehensive workflows for cleaning, transforming, and querying large datasets to extract business insights.
本项目是一个全面的 Python 编程教育材料合集,包括教程、练习与精选代码示例。它作为一个学习课程与软件工程工具包,利用 Jupyter Notebooks 将可执行代码与描述性教育文本相结合。 该仓库提供了构建大语言模型应用的实践指南,例如检索增强生成(RAG)系统、有状态 AI 代理与机器学习工作流。它通过提供结构化的代理编码工作流脱颖而出,涵盖了上下文窗口蒸馏、与提供商无关的模型路由以及模式强制的结构化输出。 这些材料涵盖了广泛的软件工程能力,包括使用分布式任务队列的异步编程、使用 REST API 的 Web 应用开发以及数据分析工作流。它还包括用于掌握面向对象设计、实现 CI/CD 流水线以及应用专业 Linting 与格式化标准的资源。
Provides structured workflows for cleaning and analyzing raw datasets to derive statistical insights.
This project is a structured data science curriculum and Python-based textbook designed to teach the fundamentals of data science through executable scripts and hands-on lessons. It functions as a guided programming tutorial for data manipulation and analysis within the Python ecosystem. The content covers introductory machine learning, including the implementation of basic models and algorithms, alongside Python data analysis for cleaning and processing datasets. The material is delivered via Jupyter Notebooks, combining modular exercises and markdown-driven documentation to map theoretical
Demonstrates how to use Python libraries to clean, process, and analyze datasets.
This is a comprehensive Python programming course and technical curriculum designed to take users from foundational syntax to advanced development patterns. It serves as a multi-disciplinary educational suite covering programming fundamentals, object-oriented design, and data analysis. The project provides specialized guides on professional development techniques, including the use of decorators, generators for memory management, and dunder-method operator overloading. It also includes instructional material on executing parallel tasks through concurrency and multiprocessing to reduce executi
Teaches the entire workflow of cleaning, transforming, and analyzing raw datasets to derive insights.