2 Repos
Utilities for filling empty cells or null values in tabular datasets.
Distinct from Random Value Populators: Candidates focus on generating mock data for test databases, whereas this is about filling gaps in existing real datasets.
Explore 2 awesome GitHub repositories matching data & databases · Missing Value Population. Refine with filters or upvote what's useful.
dplyr is an R data manipulation library that provides a grammar for transforming tabular data frames. It functions as an in-memory data frame processor and a relational data algebra tool, using a consistent set of verbs to filter, select, and summarize data. The project includes a SQL translation engine that converts high-level data manipulation expressions into optimized queries. This allows users to perform transformations directly on remote relational databases and cloud storage without pulling data locally. The library covers a broad range of tabular operations, including column mutation
Provides utilities for filling empty cells or replacing null values in tabular datasets.
qsv is a high-performance command line toolkit for querying, transforming, and analyzing comma-separated value files. It functions as a data wrangling interface and a tabular data profiler, featuring a query engine capable of executing SQL statements and joins directly on flat files without requiring a database. The project is distinguished by its ability to process massive datasets that exceed available system memory. This is achieved through disk-based external memory processing, including multithreaded merge sorting, on-disk hash tables for deduplication, and lightweight file indexing for
Provides capabilities to populate empty cells within a dataset.