3 个仓库
Converting query results between different in-memory dataframe and tensor formats.
Distinct from Object Result Fetches: Candidates focus on caching or API transformers; this is specifically about interop between Pandas, Polars, and PyArrow.
Explore 3 awesome GitHub repositories matching data & databases · Dataframe Format Conversion. Refine with filters or upvote what's useful.
Ibis is a portable Python dataframe library and multi-backend query engine that provides a unified interface for executing data transformations across diverse compute engines. It functions as a Python SQL expression compiler and dialect transpiler, allowing users to define data logic once and execute it across cloud warehouses, embedded databases, and distributed clusters without rewriting code. The project distinguishes itself through a database backend abstraction that decouples transformation logic from the underlying execution engine. It enables polyglot data workflows by mixing raw SQL s
Implements a bridge to convert query execution results between SQL backends and formats like Pandas, Polars, and PyArrow.
cuml is a GPU-accelerated machine learning library and framework that uses CUDA to accelerate tabular data preprocessing and model execution. It provides a suite of tools for training and deploying classification, regression, and clustering models on NVIDIA GPUs and GPU clusters. The library is designed for scalability, offering a distributed GPU machine learning environment that can spread computation and data across multiple hardware accelerators and nodes to handle datasets exceeding single-device memory. It mirrors standard estimator interfaces to allow the replacement of CPU-based models
Processes data directly from various in-memory dataframe and tensor formats without requiring manual conversion.
GluonTS 是一个概率时间序列库和深度学习预测框架。它提供了一套工具包,用于构建、训练和评估神经网络架构,通过将未来值预测为概率分布来量化不确定性。 该项目的独特之处在于支持零样本(zero-shot)预测,并集成了多种建模方法,包括深度概率神经网络以及对 Prophet 和 R forecast 等外部统计库的封装。它实现了因果卷积和可逆残差网络等专门的架构原语,以防止信息泄露并将潜在表示映射为有效的概率分布。 该框架涵盖了全面的数据工程功能,包括时间序列缩放、双射变换和分层建模。它利用 Apache Arrow 和 Parquet 进行高性能数据集流式传输和随机访问管理。在模型评估方面,它包含一套评估套件,使用分位数损失(quantile loss)和连续排名概率分数(CRPS)等指标来衡量预测准确性和概率覆盖率。 该库支持通过集成 Amazon SageMaker 进行模型部署。
Transforms tabular Pandas dataframes into structured formats suitable for time series modeling.