30 open-source projects similar to javascriptdata/danfojs, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Danfojs alternative.
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
r4ds is a data science curriculum and educational resource designed for mastering the R programming language. It provides a structured learning path for the end-to-end process of importing, tidying, transforming, and visualizing data. The project emphasizes a reproducible data science guide and a comprehensive curriculum for data wrangling. It includes specialized tutorials on the grammar of graphics for layered data visualization and technical publications created with Quarto that blend executable code with narrative prose. The material covers a broad range of analytical capabilities, inclu
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
This project is a Python data analysis library and exploratory data analysis framework designed for processing raw datasets. It provides a suite of tools for examining data, identifying anomalies, and applying statistical methods to uncover patterns. The repository functions as a machine learning modeling toolkit and a statistical data modeling suite. It includes predictive algorithms and mathematical models used to analyze relationships between data variables and derive insights from complex datasets. The project covers a broad range of capabilities including data science, machine learning
This project is a pandas data analysis cookbook and Python data science guide. It provides a collection of programmatic recipes and examples for cleaning, manipulating, and analyzing structured data. The project focuses on providing a containerized analysis environment to ensure a consistent workspace and reproducible dependencies when executing data processing scripts. It covers a broad range of data science capabilities, including data ingestion from external sources, raw data cleaning, and exploratory data analysis. These recipes demonstrate how to perform structured data analysis through
This project is a machine learning educational curriculum and learning platform delivered through interactive Jupyter Notebooks. It serves as a comprehensive guide for mastering the Python data science toolkit, providing structured tutorials for numerical computing, tabular data manipulation, and statistical visualization. The curriculum includes specific implementation guides for Scikit-Learn and a practical course on TensorFlow for constructing, training, and deploying neural networks and computer vision models. It covers the end-to-end process of building predictive models, from initial pr
VisiData is a terminal-based interactive data analysis tool and browser designed for exploring, filtering, and sorting large tabular datasets. It functions as a structured data inspector that loads and flattens complex formats like JSON, XML, and PCAP into interactive sheets, as well as a terminal file manager for navigating directories and performing staged filesystem operations. The project distinguishes itself by rendering data visualizations, such as scatter plots and histograms, directly in the terminal using Unicode Braille characters. It provides a Python-based data wrangling environme
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
This project is a pharmaceutical supply chain and public health data analyzer designed to track the historical distribution and quality of medical products across regional jurisdictions. It functions as a monitoring tool for vaccine distribution, analyzing supply patterns and quality variances over time. The system converts pharmaceutical sales records into regional heatmaps and spatial density maps to visualize the geographic concentration of product distribution. It includes a time-series analysis tool to track historical product movement and identify trends in regional supply and demand.
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
This project is a comprehensive library of practical Python code examples and patterns. It provides a collection of scripts and snippets designed to demonstrate a wide range of programming tasks, from basic syntax to advanced implementation patterns. The repository focuses on several core domains, including the implementation of concurrency and multithreading examples, data analysis snippets for cleaning and manipulating tabular data, and various data visualization examples. It also covers automation scripts for file system management and a variety of general programming patterns. Additional
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
AlaSQL is a JavaScript SQL database engine that allows for the filtering, grouping, and joining of in-memory object arrays and JSON data. It functions as an in-memory SQL database and client-side data processor, enabling the execution of SQL statements against JavaScript arrays and external data sources in both browser and server environments. The project serves as a universal data query tool capable of performing relational joins across diverse sources, such as merging Google Spreadsheets, SQLite files, and remote APIs into a single result set. It also acts as an IndexedDB SQL wrapper, allow
qsv is a high-performance command line toolkit for querying, transforming, and analyzing comma-separated value files. It functions as a data wrangling interface and a tabular data profiler, featuring a query engine capable of executing SQL statements and joins directly on flat files without requiring a database. The project is distinguished by its ability to process massive datasets that exceed available system memory. This is achieved through disk-based external memory processing, including multithreaded merge sorting, on-disk hash tables for deduplication, and lightweight file indexing for
xsv is a suite of high-performance command-line utilities written in Rust for the analysis, manipulation, and statistical processing of large delimited datasets. It provides a toolkit for processing comma-separated value files through a command line interface. The project provides capabilities for statistical analysis, including the computation of column statistics, value frequencies, and descriptive metrics. It also includes data manipulation utilities for joining, slicing, sampling, and reformatting records. The toolkit covers a broad range of data operations including column selection, da
ArrayFire is a hardware-agnostic compute framework and JIT-compiled tensor engine designed for high-performance numerical computing. It serves as a GPU numerical computing library and parallel signal processing toolkit that abstracts hardware backends, allowing the same codebase to execute across various GPU architectures and CPUs. The project distinguishes itself through a JIT engine that uses expression compilation to fuse operations and minimize memory overhead. It employs a deferred execution graph to optimize computation chains and provides interoperability primitives to share data and e
This project is a collection of interactive Python notebooks and educational resources designed for mastering data science, machine learning, and numerical computing. It provides a series of practical guides and tutorials covering deep learning, big data processing, and statistical analysis. The repository features specialized instructional suites for implementing classical machine learning algorithms, building deep learning model architectures, and managing AWS cloud infrastructure. It includes dedicated notebooks for data visualization and numerical computing exercises. The project covers
This project is an educational resource providing practical code examples and implementations of machine learning algorithms using the Python language. It serves as a guide for constructing predictive pipelines, clustering models, and dimensionality reduction within the Scikit-Learn ecosystem. The repository includes comprehensive demonstrations for supervised and unsupervised learning, as well as detailed examples for implementing neural networks and deep architectures. It also provides practical guidance on exporting model parameters to JSON and wrapping trained models in web APIs for produ
NumCpp is a C++ framework and numerical computing library that provides a toolkit for multi-dimensional array management and mathematical routines. It functions as a C++ implementation of the NumPy ecosystem, offering a scientific computing framework for managing tensors and performing complex algebraic equations. The project enables high-performance array manipulation within a C++ environment without relying on a Python runtime. It distinguishes itself by providing a NumPy-like interface for executing linear algebra, managing multi-dimensional data structures, and performing numerical proces
missingno is a Python library for the visualization and analysis of missing data patterns. It provides a set of tools to profile dataset completeness, map data gaps, and quantify the volume of null values across variables. The library differentiates itself through a nullity correlation analyzer and a hierarchical data clustering tool. These components allow for the detection of systemic dependencies and trends by measuring how the absence of one variable relates to the absence of another. The toolset covers broader data quality auditing and exploratory analysis capabilities. It includes feat
Vaex is a high-performance Apache Arrow DataFrame library and out-of-core data processing engine designed to handle billion-row tabular datasets in Python. It functions as a lazy evaluation framework that defers computations and transformations until results are required, enabling the processing of datasets that exceed available system RAM by mapping files directly from disk. The project distinguishes itself as a tool for big data visualization and exploration, specifically integrated for use within interactive notebooks. It provides specialized capabilities for machine learning feature engin
This project is a comprehensive pandas data analysis tutorial and instructional guide designed for learning data manipulation and analysis. It serves as a tabular data processing guide and a manual for time series analysis, providing a structured approach to cleaning, merging, and transforming datasets. The repository functions as a data feature engineering course, providing tutorials on constructing and selecting dataset features to improve machine learning model performance. It also includes a vectorized data operations guide for performing element-wise mathematical computations and matrix
This project is an exploratory data analysis framework and profiling tool designed to generate comprehensive statistical reports from Pandas and Spark DataFrames. It functions as a data quality profiler that identifies missing values, duplicates, and high correlations within tabular datasets. The tool distinguishes itself through specialized capabilities for time-series analysis, extracting temporal statistics, seasonality, and auto-correlation plots. It also includes a dataset comparison utility to identify structural or content changes between different versions of a dataset. The analysis
Mapshaper is a tool for processing, simplifying, and converting geographic vector data, available as a command-line interface, a web browser tool, and a Node.js library. It functions as a coordinate projector, vector data converter, and web map asset optimizer designed to transform spatial datasets between different coordinate reference systems and file formats. The project is distinguished by its topology-preserving geometry simplification, which reduces vertex counts while maintaining shared boundaries to prevent gaps and overlaps. It further optimizes assets for the web through coordinate
This project is a numerical computing library designed for scientific and engineering mathematical operations. It functions as a comprehensive linear algebra framework, a statistical analysis library, and a toolkit for mathematical optimization and numerical integration. The library is distinguished by its provider-based native acceleration, which allows managed code to be swapped for platform-native binary libraries to increase the performance of computationally intensive routines. It also supports a hybrid approach to matrix storage, implementing separate strategies for dense and sparse mat