30 open-source projects similar to pydata/xarray, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Xarray alternative.
ndarray is a multidimensional array library for Rust that serves as a linear algebra framework and scientific computing tool. It provides the core infrastructure for creating and manipulating n-dimensional arrays, functioning as both a parallel array processor and a toolkit for numerical data analysis. The library distinguishes itself by providing efficient slicing and memory views, allowing for data sharing without copying. It leverages optimized backend math libraries for high-speed matrix multiplication and distributes heavy mathematical iterations across multiple CPU threads to accelerate
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
This project is a comprehensive pandas data analysis tutorial and instructional guide designed for learning data manipulation and analysis. It serves as a tabular data processing guide and a manual for time series analysis, providing a structured approach to cleaning, merging, and transforming datasets. The repository functions as a data feature engineering course, providing tutorials on constructing and selecting dataset features to improve machine learning model performance. It also includes a vectorized data operations guide for performing element-wise mathematical computations and matrix
Modin is a distributed dataframe library and parallel data processing engine designed to handle large datasets that exceed system memory. It functions as a distributed computing framework that parallelizes data manipulation tasks across multiple CPU cores or clusters to increase throughput and avoid memory errors. The project mirrors the Pandas API, allowing for the distribution of data workflows without changing core code logic. It utilizes a pluggable backend interface, which enables users to switch between different distributed execution engines to optimize performance based on available h
Vaex is a high-performance Apache Arrow DataFrame library and out-of-core data processing engine designed to handle billion-row tabular datasets in Python. It functions as a lazy evaluation framework that defers computations and transformations until results are required, enabling the processing of datasets that exceed available system RAM by mapping files directly from disk. The project distinguishes itself as a tool for big data visualization and exploration, specifically integrated for use within interactive notebooks. It provides specialized capabilities for machine learning feature engin
NumPy is a foundational library for scientific computing in Python, providing a comprehensive framework for managing and manipulating large-scale numerical information. It centers on high-performance multidimensional array objects that serve as the primary data structure for complex mathematical operations and data analysis workflows. The library distinguishes itself through specialized mechanisms for handling multidimensional data, including advanced indexing, slicing, and broadcasting techniques that allow for efficient operations across arrays of varying shapes. It utilizes strided metadat
CuPy is a CUDA array computing library that implements a NumPy-compatible interface for executing array operations and numerical computing on NVIDIA GPUs. It serves as a GPU-accelerated numerical library and a CUDA-based SciPy implementation, offloading heavy calculations to graphics hardware to increase processing speed for scientific and engineering workloads. The library enables multi-framework tensor exchange, allowing data buffers to be shared between different deep learning frameworks using standardized memory layouts to avoid memory copies. It also supports custom GPU kernel integratio
This repository is a comprehensive collection of instructional guides and practical examples for Python development, focusing on machine learning, data science, and web scraping. It provides implementations for neural networks, reinforcement learning algorithms, and deep learning architectures using PyTorch, alongside detailed manuals for scientific computing and data visualization. The project distinguishes itself by offering specialized tutorials on concurrent programming to optimize CPU performance and guides for setting up Linux development environments. It covers the implementation of ad
NumCpp is a C++ framework and numerical computing library that provides a toolkit for multi-dimensional array management and mathematical routines. It functions as a C++ implementation of the NumPy ecosystem, offering a scientific computing framework for managing tensors and performing complex algebraic equations. The project enables high-performance array manipulation within a C++ environment without relying on a Python runtime. It distinguishes itself by providing a NumPy-like interface for executing linear algebra, managing multi-dimensional data structures, and performing numerical proces
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
This project is a collection of educational resources and study materials focused on scientific computing and data analysis using Python. It consists of translated notes and Jupyter notebooks designed to guide learners through the Python data ecosystem. The content covers specialized workflows including numerical computation, data cleaning, and time series analysis. These materials provide a reference for performing complex data manipulations and processing sequential data to identify patterns. The resource is organized as a series of static files and markdown documents using a flat-file dir
A Python package for manipulating 2-dimensional tabular data structures
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Pandas integration with sklearn
Gonum is a numerical computing library for the Go programming language, providing a collection of packages for scientific computing, linear algebra, statistics, and optimization. It functions as a framework for performing complex numerical computations and solving systems of linear equations. The project includes a dedicated graph analysis framework for modeling network graphs and solving connectivity and pathfinding problems. It also provides a statistical analysis toolkit for computing descriptive and inferential statistics and estimating mixture entropy. The library's capability surface c
SciPy is a scientific computing library for Python that provides a comprehensive collection of mathematical algorithms and numerical tools for research and engineering. It functions as a high-performance numerical analysis framework, bridging high-level Python code with compiled C and Fortran routines to execute complex computations at hardware speeds. The library is built upon array-based data structures that utilize strided memory layouts to enable efficient data manipulation and slicing. By employing vectorized operation dispatch and linking to optimized hardware-specific linear algebra li
Plotly.py is a comprehensive framework for building production-ready data applications and interactive dashboards directly from Python code. It functions as both a high-performance visualization library for browser-based charts and a full-stack tool for transforming analytical scripts into responsive, web-based interfaces. By abstracting away the need for manual HTML or JavaScript, it allows developers to define complex layouts and functional logic using modular, reusable components. The framework distinguishes itself through a robust architecture that handles event orchestration and state sy
This repository serves as a comprehensive research platform and toolkit for advancing machine learning, quantum computing, and large-scale scientific data analysis. It provides foundational frameworks for developing complex algorithmic systems, offering the necessary infrastructure for distributed training, computational graph execution, and high-performance model development. The project distinguishes itself by integrating specialized research domains with robust, privacy-preserving methodologies. It supports diverse scientific discovery through tools for quantum simulation, physics-informed
Math.js is a comprehensive JavaScript library for scientific, complex, and arbitrary precision calculations. It functions as a symbolic computation engine, a linear algebra toolkit, a statistical analysis library, and a unit conversion system. The project distinguishes itself by providing a symbolic engine capable of parsing, simplifying, and manipulating mathematical expressions algebraically without requiring immediate numerical evaluation. It includes a framework for defining and converting physical quantities with units of measure and automatic prefix support. The library covers a broad
pybind11 is a header-only C++ binding library that exposes C++ functions and classes as Python modules. It serves as a language bridge, mapping native types, inheritance hierarchies, and lambda functions into compatible Python objects to enable high-performance native code execution. The library includes specialized integration for NumPy arrays, utilizing buffer protocols to bind native C++ data without copying memory. It provides a toolkit for mapping C++ standard library data structures and smart pointers into the Python environment while maintaining cross-language memory management. The p
PyQtGraph is a scientific plotting and graphics framework built for PyQt and PySide applications, providing fast, interactive 2D and 3D visualizations with GPU-accelerated rendering. It serves as both a real-time signal monitoring system for streaming time-series data and a toolkit for constructing interactive data dashboards with dockable panels, parameter trees, and custom widgets. The library also includes a node-based visual flowchart tool for building data processing pipelines and a scientific graphics export system that saves plots as PNG, SVG, or CSV and converts items to Matplotlib for
This project is a comprehensive library of practical Python code examples and patterns. It provides a collection of scripts and snippets designed to demonstrate a wide range of programming tasks, from basic syntax to advanced implementation patterns. The repository focuses on several core domains, including the implementation of concurrency and multithreading examples, data analysis snippets for cleaning and manipulating tabular data, and various data visualization examples. It also covers automation scripts for file system management and a variety of general programming patterns. Additional
Orange3 is a visual data mining platform that provides an interactive canvas for building data analysis workflows without writing code. At its core, it offers a widget-based visual programming environment where users connect configurable components to perform data preprocessing, machine learning model training, statistical evaluation, and interactive visualization. The platform is built on NumPy-backed data tables with domain descriptors that define variable names, types, and roles, and includes a lazy SQL query proxy for working with database tables without loading all data into memory. The
This project is a Python wrapper for the TA-Lib C library, serving as a financial technical analysis library and quantitative trading tool. It provides a collection of mathematical functions designed to analyze market price movements, identify trading signals, and recognize candlestick patterns within financial data. The library focuses on the computation of trend, momentum, and volume metrics. It includes specialized tools for candlestick pattern recognition to detect recurring price action shapes in both historical and real-time data. The system integrates with NumPy arrays to process cont
This project is a collection of educational notes and tutorials focused on Python programming, scientific computing, and data analysis. It serves as a reference for learning language basics, advanced techniques, and object-oriented design. The materials include implementation guides for building linear, logistic, and convolutional neural networks using symbolic graph frameworks. It also provides instruction on manipulating and visualizing structured data frames and performing complex mathematical operations through numerical libraries. The repository includes a system for converting interact
This project is a collection of pre-configured Docker images that provide ready-to-run environments for interactive computing and data science. It functions as a scientific computing stack and a polyglot notebook server, bundling language interpreters and libraries for Python, R, and Julia within a containerized system to ensure reproducible research environments. The collection uses a layered image hierarchy to provide versioned software dependencies and support for hardware acceleration across different CPU architectures. It allows for the creation of custom images based on a foundation of
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
This project is a community-driven standard library for the Fortran programming language, providing a comprehensive collection of algorithms, data structures, and system utilities. It is designed to extend the language's native capabilities, offering a unified toolkit for scientific computing, numerical analysis, and general-purpose programming. The library distinguishes itself through a modular architecture that utilizes generic interface dispatch and compile-time specialization to ensure high performance across various data types. It provides standardized abstractions for external numerical