105 repositorios
Environments for managing heterogeneous two-dimensional arrays.
Distinguishing note: Focuses on the framework level for tabular data rather than specific algorithms.
Explore 105 awesome GitHub repositories matching data & databases · Tabular Data Frameworks. Refine with filters or upvote what's useful.
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
Provides a robust environment for managing heterogeneous tabular data.
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Performs distributed relational transformations on structured data using SQL and programmatic interfaces.
This repository serves as a comprehensive research platform and toolkit for advancing machine learning, quantum computing, and large-scale scientific data analysis. It provides foundational frameworks for developing complex algorithmic systems, offering the necessary infrastructure for distributed training, computational graph execution, and high-performance model development. The project distinguishes itself by integrating specialized research domains with robust, privacy-preserving methodologies. It supports diverse scientific discovery through tools for quantum simulation, physics-informed
Forecasts riverine and flash floods using hydrologic models and satellite-derived datasets to provide early warnings.
This project is a collection of interactive Python notebooks and educational resources designed for mastering data science, machine learning, and numerical computing. It provides a series of practical guides and tutorials covering deep learning, big data processing, and statistical analysis. The repository features specialized instructional suites for implementing classical machine learning algorithms, building deep learning model architectures, and managing AWS cloud infrastructure. It includes dedicated notebooks for data visualization and numerical computing exercises. The project covers
Provides instructional material on managing heterogeneous two-dimensional arrays for data manipulation using pandas.
This project is a comprehensive collection of common computer science algorithms and data structures implemented in Swift. It serves as an educational reference and library for studying computational complexity, algorithmic logic, and data structure engineering through practical code examples. The repository provides a wide suite of data structure implementations, including various types of linked lists, heaps, hash tables, and an extensive range of hierarchical trees such as Red-Black, B-Tree, and Splay trees. It also covers diverse sorting and searching techniques, from basic bubble sort to
Provides systems for predicting data categories based on features and training sets.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Provides recursive multistep forecasting by feeding model-generated predictions back into the input window.
XGBoost is a distributed machine learning library for implementing scalable gradient boosting decision trees used for regression, classification, and ranking. It functions as a predictive model framework and a cross-language toolkit, providing a core implementation with native bindings for Python, R, Java, Scala, and C++. The system is designed as a GPU-accelerated library that utilizes CUDA and NCCL to speed up the training of decision tree ensembles. It operates as a distributed framework capable of scaling training and prediction across multi-node clusters and GPU environments to process m
Provides a framework for building predictive models on structured tabular data using boosted trees and random forests.
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Processes structured datasets with missing value imputation, categorical encoding, and embedding layers for predictive modeling.
react-virtualized is a library of components for rendering massive lists and tables by drawing only the elements visible in the viewport. It provides specialized layout managers including a windowed grid component and a dynamic height list manager. The project includes a masonry layout engine for packing items of varying heights and widths, as well as an infinite scroll interface for incrementally fetching and appending data. The library covers a broad range of virtualization capabilities, including frozen grid elements, reverse list rendering, and synchronized viewport scrolling. It also su
Provides framework-level support for managing and sorting heterogeneous two-dimensional tabular data.
Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines. The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex q
Isolates table structures from raw CSV content for document integration.
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
Provides frameworks for loading and processing structured tabular data to extract insights.
This project is a modular, open-source customer relationship management platform built on the Laravel framework. It serves as a comprehensive business application framework designed for tracking sales pipelines, managing business entities, and automating marketing workflows. By providing a self-hosted solution, it enables organizations to maintain full control over their contact data, sales leads, and communication history. The platform distinguishes itself through a highly extensible architecture that allows developers to modify core behavior without altering the underlying source code. It u
Ships sortable, paginated tabular data grids that utilize AJAX for efficient server-side record management.
This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs. The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multip
Compiles inference engines into single source files to simplify deployment across platforms.
This repository is a collection of foundational machine learning models and predictive analysis tools designed for the study of statistical learning methods. It serves as an educational resource that demonstrates the mathematical principles of classic algorithms through direct, first-principles implementation. The project distinguishes itself by constructing models from the ground up, relying on fundamental linear algebra and calculus operations rather than high-level abstraction frameworks. Each algorithm is organized into modular, standalone scripts that mirror the sequence of mathematical
Provides a toolkit of modular scripts for predictive data modeling using fundamental mathematical operations.
TensorFlow.js is a JavaScript machine learning library used for training and deploying models in web browsers and server-side environments. It functions as a browser-based model trainer, a WebAssembly inference engine, and a WebGPU accelerated tensor library for low-level linear algebra. The project also includes a model converter to transform Python-based models into optimized formats for JavaScript execution. The library distinguishes itself through a pluggable backend architecture that allows mathematical operations to be executed via CPU, WebGL, or WebGPU. It supports the conversion of Py
Imports datasets from disk or web sources in various formats for machine learning use.
This project is an international phone number library used for parsing, formatting, and validating phone numbers based on the E.164 standard. It provides a validation engine and parser to convert raw strings into structured objects and verify if numbers conform to regional numbering rules. The library includes a metadata provider that maps phone numbers to geographic locations, time zones, and network carriers. It can distinguish between line types, such as fixed-line or mobile, to verify SMS compatibility and identify original network operators. Additional capabilities include extracting ph
Implements a metadata engine that loads regional phone number rules from CSV files.
This repository serves as a public archive for the raw datasets and analytical code used to support journalistic reporting. It functions as a platform for reproducible research, providing the necessary materials for users to verify published findings and conduct independent statistical analysis. The collection utilizes a versioned storage model to track historical changes to both data and processing scripts. By organizing information into a structured directory hierarchy, the repository maps specific journalistic projects to their corresponding inputs and outputs, ensuring that the methodolog
Delivers structured information in lightweight, human-readable CSV formats for broad analytical compatibility.
FramePack is a neural video synthesis engine and generation framework designed to produce long, temporally consistent video sequences. It functions as a diffusion model optimizer, providing a suite of techniques to manage the computational demands of high-parameter video models while maintaining visual stability during extended generation tasks. The system distinguishes itself through a hierarchical approach to frame prediction, which plans distant anchor frames before filling in intermediate content to prevent cumulative temporal drift. By utilizing constant-length context compression and to
Implements hierarchical anchor frame prediction to prevent temporal drift and ensure visual stability.
Tolaria is a markdown knowledge base manager and bidirectional note linking system. It functions as an integrated environment for organizing notes and structured data, utilizing YAML frontmatter and wikilinks to establish relational mappings between documents. The project distinguishes itself by integrating language model capabilities directly into the editor for content generation and analysis. It further combines prose with structured data through a markdown spreadsheet editor that renders CSV-formatted files as interactive grids with formula support and cross-sheet referencing. The platfo
Provides an interface for managing and editing two-dimensional numeric data stored as CSV in markdown files.
This library is a collection of machine learning algorithms and neural network components implemented from scratch using only NumPy. It serves as an educational toolkit for constructing and experimenting with machine learning architectures, emphasizing a modular approach where algorithms are organized into self-contained, object-oriented classes. The project distinguishes itself by relying exclusively on array-oriented programming to perform mathematical operations, ensuring that all computations are vectorized for performance. By utilizing a standardized interface for forward and backward pa
Implements flexible nonparametric predictive models like kernel regression and Gaussian processes.