22 repository-uri
Maps continuous or large-range values to sorted ranks for efficient processing.
Distinct from Data Sorting Engines: Distinct from Data Sorting Engines: focuses on rank-mapping for algorithmic efficiency rather than general dataset ordering.
Explore 22 awesome GitHub repositories matching scientific & mathematical computing · Data Discretization. Refine with filters or upvote what's useful.
Gym is a reinforcement learning environment toolkit and agent simulation framework. It provides a standardized API and a universal communication interface that defines how learning agents interact with simulation environments through actions and observations. The project includes a benchmark environment suite and a diverse library of pre-configured simulation worlds, including physics engines and classic control tasks. It enables the creation of custom simulation environments to train agents in specific operational scenarios while ensuring reproducibility across different learning algorithms.
Translates high-level agent decisions into specific numerical values compatible with underlying physics or logic engines.
This project is a comprehensive, community-maintained knowledge base and toolkit designed for competitive programming. It serves as a centralized repository for algorithmic theory, data structures, and mathematical techniques, providing a structured reference for informatics and collegiate programming competitions. The project distinguishes itself by integrating educational content with a robust suite of automation utilities. It provides a complete workflow for competitive programming, including tools for automated test case generation, solution verification, and direct interaction with onlin
Maps values to sorted ranks to facilitate efficient algorithmic processing.
This project is a machine learning algorithm reference and implementation guide that provides theoretical foundations and code for supervised learning, deep learning, and natural language processing. It serves as a comprehensive toolkit for implementing predictive models and a technical reference for algorithm engineering. The project focuses on ensemble learning frameworks, including the construction of decision trees, random forests, and gradient boosting models. It also functions as a probabilistic graphical model library and an NLP algorithm reference, with specific implementations for se
Converts continuous variables into discrete bins to improve model robustness against outliers and introduce non-linearity.
FramePack is a neural video synthesis engine and generation framework designed to produce long, temporally consistent video sequences. It functions as a diffusion model optimizer, providing a suite of techniques to manage the computational demands of high-parameter video models while maintaining visual stability during extended generation tasks. The system distinguishes itself through a hierarchical approach to frame prediction, which plans distant anchor frames before filling in intermediate content to prevent cumulative temporal drift. By utilizing constant-length context compression and to
Discretizes historical data into tokens to align training distributions with inference patterns.
This project is an educational resource designed to teach the mathematical foundations and core algorithms of reinforcement learning. It provides a structured academic curriculum that combines textbooks, lecture materials, and practical code examples to guide learners through the principles of Markov decision processes and reinforcement learning theory. The repository distinguishes itself by integrating a grid-based simulation framework that allows users to test algorithms within custom environments. This environment supports the analysis of agent performance by rendering state values, polici
Maps continuous environment coordinates to numerical indices to simplify transition probability calculations.
This project is an agnostic model interpretability framework and explainability tool designed to provide local interpretable explanations for individual predictions. It functions as a local surrogate model that approximates the behavior of any machine learning classifier or regression model to identify the most influential features for a specific instance. The framework is designed to be model-agnostic, meaning it can explain predictions across tabular, text, and image data regardless of the underlying architecture. It employs local linear approximations and feature importance visualization t
Provides utilities to convert continuous numerical variables into discrete bins to simplify feature influence explanations.
This project is a multimodal translation framework and large language model capable of speech-to-speech, speech-to-text, and text-to-text translation across nearly 100 languages. It provides a real-time speech translation engine and a comprehensive toolkit for converting spoken audio between languages. The system is distinguished by its ability to preserve the original speaker's tone, pace, and prosody during translation. It utilizes a specialized on-device inference toolkit that converts model checkpoints into C-based libraries, enabling low-latency execution on mobile and edge hardware with
Transforms continuous audio waveforms into sequences of discrete units for efficient model processing.
GoLearn is a machine learning library for the Go programming language. It provides a supervised learning framework and a toolkit for building, training, and evaluating predictive models through a standardized interface. The project implements a data frame system that loads CSV files into structured grids for matrix operations. It includes a preprocessing library for discretizing continuous variables and a model evaluation toolkit that utilizes confusion matrices and cross-validation to measure precision and recall. The library covers data engineering and management, including the ability to
Processes discrete data by merging histograms and combining data points to prepare datasets for predictive modeling.
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
Converts continuous numerical features into discrete bins or quartiles for distribution analysis.
VisiData is a terminal-based interactive data analysis tool and browser designed for exploring, filtering, and sorting large tabular datasets. It functions as a structured data inspector that loads and flattens complex formats like JSON, XML, and PCAP into interactive sheets, as well as a terminal file manager for navigating directories and performing staged filesystem operations. The project distinguishes itself by rendering data visualizations, such as scatter plots and histograms, directly in the terminal using Unicode Braille characters. It provides a Python-based data wrangling environme
Groups numeric values into calculated ranges to create histograms and visualize distribution.
This project is a comprehensive library of practical Python code examples and patterns. It provides a collection of scripts and snippets designed to demonstrate a wide range of programming tasks, from basic syntax to advanced implementation patterns. The repository focuses on several core domains, including the implementation of concurrency and multithreading examples, data analysis snippets for cleaning and manipulating tabular data, and various data visualization examples. It also covers automation scripts for file system management and a variety of general programming patterns. Additional
Implements grouping of continuous numeric values into discrete categories based on range boundaries.
Omost is a system of software components designed for iterative image refinement, regional layout control, and the optimization of text-to-image embedding processes. It functions as a diffusion model layout controller and an engine that uses large language models to generate executable code for precise control over image composition. The project features a conversational image editor that allows for the refinement of visual content through natural language instructions and automated code execution. It distinguishes itself through a text embedding optimizer that organizes sub-prompts into tree
Provides a grid-based coordinate system to map global and local descriptions to specific image areas.
Orange3 is a visual data mining platform that provides an interactive canvas for building data analysis workflows without writing code. At its core, it offers a widget-based visual programming environment where users connect configurable components to perform data preprocessing, machine learning model training, statistical evaluation, and interactive visualization. The platform is built on NumPy-backed data tables with domain descriptors that define variable names, types, and roles, and includes a lazy SQL query proxy for working with database tables without loading all data into memory. The
Ships a widget that converts continuous numeric attributes into categorical bins using partitioning strategies.
Acest proiect este un curriculum de deep reinforcement learning care oferă materiale educaționale și exerciții de implementare pentru stăpânirea agenților bazați pe rețele neuronale. Acesta servește drept framework pentru construirea versiunilor de referință ale metodelor bazate pe valoare și pe politică pentru a rezolva probleme de decizie secvențială. Proiectul oferă implementări specifice pentru simulări de control continuu și reinforcement learning multi-agent, unde agenții sunt antrenați să coopereze sau să concureze în medii partajate. Include un framework de gradient de politică pentru optimizarea comportamentului agentului prin metode precum REINFORCE. Capabilitățile acoperă o gamă largă de algoritmi de optimizare, inclusiv deep Q-learning, gradienți de politică deterministă și programare dinamică pentru modelarea proceselor de decizie Markov. Sistemul suportă diverse domenii de antrenament, cum ar fi navigația robotică, automatizarea tranzacțiilor financiare și simulările bazate pe fizică. Materialele sunt livrate sub forma unei serii de Jupyter Notebooks.
Provides methods for mapping continuous environment coordinates to discrete numerical indices for state-space modeling.
Acest proiect este un tutorial cuprinzător de analiză a datelor pandas și un ghid instrucțional conceput pentru învățarea manipulării și analizei datelor. Acesta servește drept ghid de procesare a datelor tabelare și un manual pentru analiza seriilor temporale, oferind o abordare structurată pentru curățarea, fuziunea și transformarea seturilor de date. Repository-ul funcționează ca un curs de feature engineering pentru date, oferind tutoriale despre construirea și selectarea caracteristicilor setului de date pentru a îmbunătăți performanța modelului de machine learning. Include, de asemenea, un ghid de operațiuni vectorizate pe date pentru efectuarea de calcule matematice element-cu-element și manipulări de matrice. Materialul acoperă o gamă largă de capabilități, inclusiv fluxuri de lucru de curățare a datelor, sarcini de integrare a datelor și analiză a datelor tabelare. Oferă îndrumări privind procesarea informațiilor textuale, gestionarea datelor categorice și optimizarea vitezei de execuție pentru seturi de date mari. Proiectul este livrat sub forma unei serii de Jupyter Notebooks care conțin exerciții practice și probleme de practică țintite.
Teaches how to convert continuous numerical values into discrete bins for improved data interpretability.
Vega-Lite is a high-level declarative language for specifying interactive, multi-view visualizations. It compiles a concise JSON specification into a full Vega visualization, automatically inferring scales, axes, and legends from encoding declarations. The grammar-of-graphics encoding maps data fields to visual channels such as position, color, size, and shape, while a multi-view composition grammar enables layered, faceted, concatenated, and repeated layouts. Reactive parameter binding links named parameters to input widgets, selections, and expressions for dynamic updates. The project suppo
Vega-Lite discretizes numeric values into bins for aggregation and histogram visualizations.
Acest proiect este un framework de calcul științific pentru ecosistemul .NET, oferind o suită cuprinzătoare de biblioteci pentru analiză numerică, statistică și optimizare matematică. Acesta servește ca un toolkit fundamental pentru dezvoltarea aplicațiilor în machine learning, procesarea semnalelor digitale și computer vision. Framework-ul oferă toolkit-uri specializate pentru antrenarea și implementarea modelelor predictive, inclusiv rețele neuronale, mașini cu vectori suport (SVM) și arbori de decizie. Se distinge, de asemenea, prin integrări profunde pentru analiză vizuală în timp real, cum ar fi urmărirea obiectelor și detectarea trăsăturilor faciale, alături de o bibliotecă dedicată de procesare a semnalelor digitale pentru captarea și filtrarea semnalelor audio și ale senzorilor. Suprafața de capabilități se extinde la descompunerea matricială de nivel înalt și algebră liniară, modelarea probabilistică a stărilor și algoritmi de căutare euristică. Acoperă, de asemenea, o gamă largă de utilitare pentru manipularea datelor, de la reducerea dimensionalității și normalizare până la organizarea datelor spațiale și componente de vizualizare științifică. Sistemul include controllere de integrare hardware pentru configurarea camerei, gestionarea porturilor GPIO și hardware specializat de detectare a adâncimii.
Converts continuous numerical data into discrete bins or categories for improved model interpretability.
Alphalens is a quantitative alpha factor analysis library designed to measure the predictive power of financial factors. It serves as a computational toolset for processing financial time series and calculating performance metrics to evaluate quantitative trading hypotheses. The library distinguishes itself through the use of quantile-based data binning to analyze return distributions across different factor strength levels. It aligns historical alpha signals with forward-looking price changes to isolate predictive effects and transforms these metrics into heatmaps and time-series charts for
Converts continuous numerical financial signals into discrete bins or quartiles to analyze return distributions.
filterpy is a toolkit for Bayesian state estimation, Gaussian statistical analysis, and time-series noise reduction. It provides a library of linear and non-linear Kalman filters, as well as routines for non-Gaussian state estimation and signal smoothing. The project implements a variety of estimation methods, including particle filtering using Markov Chain Monte Carlo and resampling, and discrete Bayes filtering. It also includes a suite of algorithms for refining historical state estimates through backward and fixed-lag smoothing. Additional capabilities cover multivariate Gaussian analysi
Provides utilities to discretize linear differential equations to model system behavior between measurements.
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
Deno-xtensor maps array values to the index of the corresponding histogram bin.