30 open-source projects similar to fonnesbeck/statistical-analysis-python-tutorial, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Statistical Analysis Python Tutorial alternative.
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
This project is a scientific computing framework for the .NET ecosystem, providing a comprehensive suite of libraries for numerical analysis, statistics, and mathematical optimization. It serves as a foundational toolkit for developing applications in machine learning, digital signal processing, and computer vision. The framework provides specialized toolkits for training and deploying predictive models, including neural networks, support vector machines, and decision trees. It further distinguishes itself with deep integrations for real-time visual analysis, such as object tracking and facia
SciPy is a scientific computing library for Python that provides a comprehensive collection of mathematical algorithms and numerical tools for research and engineering. It functions as a high-performance numerical analysis framework, bridging high-level Python code with compiled C and Fortran routines to execute complex computations at hardware speeds. The library is built upon array-based data structures that utilize strided memory layouts to enable efficient data manipulation and slicing. By employing vectorized operation dispatch and linking to optimized hardware-specific linear algebra li
This project is a collection of educational notes and tutorials focused on Python programming, scientific computing, and data analysis. It serves as a reference for learning language basics, advanced techniques, and object-oriented design. The materials include implementation guides for building linear, logistic, and convolutional neural networks using symbolic graph frameworks. It also provides instruction on manipulating and visualizing structured data frames and performing complex mathematical operations through numerical libraries. The repository includes a system for converting interact
ggplot2 is a data visualization library for R based on a formal grammar of graphics. It provides a declarative plotting framework that allows users to create complex graphics by combining geometric objects, statistical summaries, and coordinate systems. The system is distinguished by a layered approach to composition, where visualizations are built incrementally by stacking independent geometric, statistical, and coordinate layers. It utilizes a hierarchical styling engine to manage non-data elements such as backgrounds, fonts, and margins, and includes a multi-panel faceting tool for splitti
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
This project is a machine learning educational curriculum and learning platform delivered through interactive Jupyter Notebooks. It serves as a comprehensive guide for mastering the Python data science toolkit, providing structured tutorials for numerical computing, tabular data manipulation, and statistical visualization. The curriculum includes specific implementation guides for Scikit-Learn and a practical course on TensorFlow for constructing, training, and deploying neural networks and computer vision models. It covers the end-to-end process of building predictive models, from initial pr
This project is a Python machine learning library and data science toolkit designed for building predictive models and analyzing complex datasets. It provides a collection of implementations for common supervised and unsupervised algorithms using the Scikit-Learn framework. The toolkit includes a predictive modeling suite for generating predictions from historical data and a statistical analysis framework for applying Bayesian modeling and causality tests. It also features a data visualization suite based on Matplotlib for rendering static charts and graphs to interpret classifier boundaries
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
This project is a curated collection of programming exercises designed to build proficiency in numerical computing and data manipulation. It provides a structured learning path for mastering multidimensional array operations, vectorized arithmetic, and statistical analysis. The repository focuses on developing practical expertise in array-based workflows, emphasizing techniques such as memory management, efficient data processing, and the replacement of explicit loops with vectorized operations. Users engage with hands-on challenges that cover the full lifecycle of numerical data, from initia
This project is a Python data science curriculum and programming tutorial collection. It provides a structured set of educational notebooks and scripts designed to teach data analysis, machine learning, and deep learning. The repository serves as a learning path for building and tuning predictive models, including regression, decision trees, and neural networks. It includes a data visualization guide for creating financial time-series plots and a multiprocessing reference for implementing parallel task execution and shared memory synchronization. The curriculum covers broader capability area
metrics-graphics is a data visualization library and declarative graphics framework designed to create principled data graphics and layouts. It functions as a statistical graphics engine that maps raw data to geometric shapes and structured objects to render complex, data-driven layouts. The toolkit specializes in rendering time-series data through line charts and scatterplots using a consistent layout system. It also provides capabilities for statistical distribution mapping, including the creation of rug plots to represent one-dimensional data density. The system covers a broad surface of
This library is a collection of machine learning algorithms and neural network components implemented from scratch using only NumPy. It serves as an educational toolkit for constructing and experimenting with machine learning architectures, emphasizing a modular approach where algorithms are organized into self-contained, object-oriented classes. The project distinguishes itself by relying exclusively on array-oriented programming to perform mathematical operations, ensuring that all computations are vectorized for performance. By utilizing a standardized interface for forward and backward pa
Data Formulator is an automated data analysis and visualization platform that uses large language models to interpret natural language instructions for data preparation and reporting. It functions as an interactive workbench where users can clean, filter, and aggregate datasets while simultaneously generating visual representations. By combining conversational interfaces with automated transformation tools, the system enables users to explore data patterns and refine schemas without manual coding. The platform distinguishes itself through an agentic architecture that translates natural langua
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
This project is a comprehensive educational curriculum designed to teach Python programming through the lens of data science and financial analysis. It provides a structured guide for learning how to process complex numerical information, build data models, and perform scientific computing tasks using standard industry libraries. The materials focus on practical applications, enabling users to develop skills in financial data analysis and interactive exploration. By working through these resources, learners gain experience in executing high-performance mathematical operations, transforming ra
Python-Guide-CN is a Chinese translation of a comprehensive guide to idiomatic Python programming and software development. It serves as a curated programming tutorial and ecosystem reference, providing a structured path for learning Python syntax, standard libraries, and professional coding patterns. The project distinguishes itself by offering detailed instructions for setting up development environments across Windows, macOS, and Linux. It specifically focuses on the selection of interpreters and the management of virtual environments to ensure a consistent workspace. The guide covers a b
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
This project is a collection of educational resources and study materials focused on scientific computing and data analysis using Python. It consists of translated notes and Jupyter notebooks designed to guide learners through the Python data ecosystem. The content covers specialized workflows including numerical computation, data cleaning, and time series analysis. These materials provide a reference for performing complex data manipulations and processing sequential data to identify patterns. The resource is organized as a series of static files and markdown documents using a flat-file dir
This project is a comprehensive collection of Python programming education materials, including tutorials, exercises, and curated code samples. It serves as a learning curriculum and software engineering toolkit, utilizing Jupyter Notebooks to combine executable code with descriptive educational text. The repository provides practical implementation guides for building large language model applications, such as retrieval-augmented generation systems, stateful AI agents, and machine learning workflows. It distinguishes itself by offering a structured approach to agentic coding workflows, cover
This project is a comprehensive library of practical Python code examples and patterns. It provides a collection of scripts and snippets designed to demonstrate a wide range of programming tasks, from basic syntax to advanced implementation patterns. The repository focuses on several core domains, including the implementation of concurrency and multithreading examples, data analysis snippets for cleaning and manipulating tabular data, and various data visualization examples. It also covers automation scripts for file system management and a variety of general programming patterns. Additional
This project is a collection of interactive, command-line programming lessons designed for the swirl R package. It provides a structured curriculum for learning R programming and data science through a series of guided, self-paced exercises delivered via a command-line interface. The content covers a broad range of data science education, including language fundamentals, data cleaning and manipulation, statistical analysis, and data visualization. It also includes instructional modules focused on software development practices. These lessons are developed as a modular hierarchy of courses an
Facets is a set of interactive software tools for the statistical analysis, distribution visualization, and multidimensional exploration of machine learning datasets. It provides a visual interface for identifying outliers and missing values in numeric and string data, specifically designed for auditing dataset quality and identifying skews between training and validation sets. The system uses multidimensional facet-based visualization and interactive bucketing to map individual data points across multiple feature axes. It employs synchronized view filtering and animated dimension transitions
Orange3 is a visual data mining platform that provides an interactive canvas for building data analysis workflows without writing code. At its core, it offers a widget-based visual programming environment where users connect configurable components to perform data preprocessing, machine learning model training, statistical evaluation, and interactive visualization. The platform is built on NumPy-backed data tables with domain descriptors that define variable names, types, and roles, and includes a lazy SQL query proxy for working with database tables without loading all data into memory. The
ThinkStats2 is a computational statistics course and educational library designed to teach probability and statistics through a programmatic approach. It provides a framework for studying statistical concepts by writing Python code and running simulations on real-world datasets. The project uses interactive notebooks and a collection of Python modules to deliver guided lessons. It emphasizes the verification of theoretical statistical laws through iterative computational experiments and simulation-driven testing. The resource covers broad capabilities in data analysis and data science traini
F2 is a cross-platform charting engine and grammar-based visualization tool designed to render interactive data visualizations. It functions as a declarative system that allows users to build complex charts by defining the relationships between data dimensions and visual encoding channels. The framework is specifically optimized for mobile data visualization, providing a toolkit for creating touch-optimized charts. It supports custom data visualization styling, enabling the use of personalized shapes and animations to define a unique visual identity. The engine provides a platform-agnostic r
ggplot2 is an R data visualization library and statistical graphics engine. It implements a grammar of graphics that functions as a declarative plotting framework, allowing users to specify what a plot should contain rather than how to draw it. The system builds visualizations by mapping data variables to visual aesthetics through a structured set of layering rules. This approach enables the composition of complex graphics by stacking independent components, such as geometric objects and scales, on top of a shared coordinate system. The framework supports scientific plotting and exploratory
dplyr is an R data manipulation library that provides a grammar for transforming tabular data frames. It functions as an in-memory data frame processor and a relational data algebra tool, using a consistent set of verbs to filter, select, and summarize data. The project includes a SQL translation engine that converts high-level data manipulation expressions into optimized queries. This allows users to perform transformations directly on remote relational databases and cloud storage without pulling data locally. The library covers a broad range of tabular operations, including column mutation
Plotnine is a data visualization library for Python based on the Grammar of Graphics. It serves as a declarative statistical plotting framework and multi-panel plotting engine, allowing users to create complex charts by mapping data variables to visual properties such as position, color, and size. The project is distinguished by its use of a layered composition model and a statistical transformation engine that performs aggregations and computations before rendering visuals. It features a comprehensive system for multi-panel faceting, which enables the splitting of a single visualization into