What are the main features of javascriptdata/danfojs?

The main features of javascriptdata/danfojs are: JavaScript Data Transformations, Tabular Data Structures, Tensor Conversion Utilities, Data Preprocessing for Modeling, Data Analysis Libraries, Data Cleaning Procedures, Storage, Data Concatenations.

Danfojs - manipula datos tabulares en JavaScr…

Danfo.js es una biblioteca de análisis y preprocesamiento de datos para JavaScript que proporciona estructuras de datos etiquetadas de alto rendimiento. Implementa dataframes y series para permitir análisis de datos complejos, computación estadística y la manipulación de datos tabulares estructurados.

El proyecto sirve como una biblioteca de preprocesamiento para machine learning, ofreciendo utilidades para codificación de etiquetas categóricas, one-hot encoding y escalado y estandarización de características numéricas. Facilita específicamente la conversión de estructuras de datos etiquetadas en tensores para el entrenamiento y evaluación de modelos.

La biblioteca cubre un amplio conjunto de capacidades, incluyendo estadísticas descriptivas, operaciones relacionales como merge y join, y procesamiento de series temporales. Incluye herramientas para limpieza, filtrado y agrupación de datos, así como una interfaz de visualización para generar gráficos interactivos directamente desde los dataframes.

El sistema soporta la importación y exportación de datos mediante formatos CSV, JSON y Excel.

Features

JavaScript Data Transformations - Implements a comprehensive framework for processing and transforming structured tabular data using dataframes and series in JavaScript.
Tabular Data Structures - Implements high-performance labeled data structures for managing relational data with column mutation and label alignment.
Tensor Conversion Utilities - Transforms structured labeled data into tensor formats for use in machine learning workflows.
Data Preprocessing for Modeling - Offers specialized utilities for cleaning, scaling, and encoding datasets to prepare them for machine learning model training.
Data Analysis Libraries - Serves as a high-performance library for cleaning and transforming structured datasets within JavaScript environments.
Data Cleaning Procedures - Provides procedures for cleaning datasets, including the removal of duplicate values and noise filtering.
Storage - Uses TensorFlow.js tensors as the underlying storage mechanism to enable high-performance vectorized operations.
Data Concatenations - Combines multiple data frames or series into a single object along a specified axis.
Data Grouping Utilities - Provides utilities for organizing data into groups based on keys or labels for aggregated calculations.
Data Joins - Merges and joins multiple data structures together based on shared keys or indices.
Tensor Transformations - Provides utilities to transform structured data frames into tensors for compatibility with machine learning frameworks.
Labeled Data Structures - Implements high-performance labeled data structures for complex statistical computing and data analysis.
Data Subsetting and Extraction - Retrieves specific columns, rows, or cross-sections using labels, integer positions, or slice notation.
Exploratory Data Analysis - Provides tools for calculating descriptive statistics and generating charts to discover patterns in datasets.
Group-By Aggregations - Provides core group-by operations to perform summary calculations and extract insights from relational datasets.
Label-Based Data Selection - Implements label-based indexing to allow data retrieval and alignment using descriptive row and column identifiers.
Labeled Series Data Structures - Stores labeled data in a series format with support for custom indices and data types.
Missing Data Removal - Identifies and handles null, undefined, or NaN values to ensure data quality.
Position-Based Data Selection - Retrieves specific data subsets using integer indices, arrays of positions, or slice notation.
Group Extractions - Implements the ability to retrieve a standalone data structure containing only rows associated with a specific group name.
Tabular Data Managers - Supports importing, merging, and joining datasets from CSV, JSON, and Excel into unified relational structures.
Tabular Data Organization - Enables cleaning, filtering, and aggregating of relational datasets using column-based operations.
Tabular Structure Creation - Builds dataframes and series from JSON objects, arrays, tensors, and dictionaries.
Inter-Format Conversions - Converts between arrays, JSON, lists, objects, and tensors to enable interoperability between formats.
Element-wise Comparisons - Evaluates logical relationships between data series using element-wise comparisons and broadcasting.
Element-wise Array Operations - Executes high-performance mathematical functions across entire data series simultaneously using tensor acceleration.
Statistical Analysis Libraries - Calculates essential descriptive statistics including mean, median, variance, and standard deviation.
Feature Scale Normalization - Transforms numeric features to fit within specified ranges to ensure stable model convergence.
One-Hot - Converts categorical labels into one-hot numeric arrays for machine learning tasks.
Cumulative Aggregate Calculations - Computes running totals, products, and extrema along specific data axes.
Categorical Encodings - Converts categorical text labels into unique integers for use in numerical machine learning models.
Data Alignments - Synchronizes datasets using automatic label-based alignment during mathematical computations.
Statistical Standardization - Rescales numeric data to a mean of zero and a standard deviation of one.
Interactive Data Visualizations - Implements interactive line and box plots allowing users to visually explore data frames and series.
Data Type Casting - Converts column data types to ensure consistency for processing and analysis.
Data Export - Writes structured data frames or series into CSV, Excel, or JSON files for external storage.
Data Visualization Interfaces - Provides a customizable interface for rendering and analyzing structured data via interactive plots.
Dataset Label Management - Provides tools to modify axis labels, reset indices, and remove specified rows or columns from datasets.
Expression-Based Data Querying - Extracts subsets of data by evaluating boolean expressions against the columns of a dataset.
Series Value Mappings - Applies custom functions or mapping correspondences to transform every value within a data series.
Tensor Conversions - Transforms labeled data structures into tensors required for training and evaluating machine learning models.
Grouped Cumulative Metrics - Calculates running totals, products, or extrema for each group to track progress.
Split-Apply-Combine Patterns - Implements the split-apply-combine pattern to segment datasets, apply functions, and reassemble results.
Tabular Data Import - Reads relational data from CSV, Excel, and JSON files into structured formats for analysis.
Tabular Data Sorting - Rearranges data frame rows based on the values of specific columns in ascending or descending order.
Format-Agnostic Converters - Converts diverse input formats such as JSON and CSV into internal typed tabular structures.
Time Series Analysis - Includes analytical methods for processing date-based indexing and managing sequential temporal data.
Unique Value Counting - Identifies unique elements and counts their occurrences to determine the mode of the dataset.
User-Defined Functions - Allows the application of user-defined functions across data axes to transform or analyze values.
Boolean Masking - Creates boolean masks via logical expressions to filter and index specific data subsets.
Chart Generators - Provides utilities to render various graphical representations like line, bar, and scatter plots from structured data.
Descriptive Statistics Summaries - Generates a bundled statistical summary to provide an overview of the dataset's distribution.
Data Trend Visualizations - Generates a variety of charts to identify patterns and trends within tabular datasets.

iamseancheney/python_for_data_analysis_2nd_chinese_version

8,937Ver en GitHub

This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p

Nyandwi/machine_learning_complete

4,983Ver en GitHub

This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi

hosseinmoein/DataFrame

2,917Ver en GitHub

DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f

pandas-dev/pandas

49,039Ver en GitHub

Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized

Historial de estrellas

javascriptdatadanfojs

Danfojs

Features

Alternativas open-source a Danfojs