5 个仓库
Libraries for manipulating and analyzing tabular datasets using GPU acceleration.
Distinct from GPU Acceleration Libraries: Existing candidates focus on general acceleration or plotting, not the specific dataframe API identity.
Explore 5 awesome GitHub repositories matching data & databases · GPU DataFrame Libraries. Refine with filters or upvote what's useful.
cuDF is a GPU-accelerated dataframe library and data processing engine designed for manipulating and analyzing large tabular datasets. It provides a high-level API for executing filtering, joining, and aggregating operations directly on GPU hardware. The project integrates the Apache Arrow memory format to enable zero-copy data transfers and includes a just-in-time compiler for executing custom user-defined functions on the GPU. The library features specialized acceleration for existing workflows by redirecting standard Pandas dataframe calls and Polars query plans to a GPU backend. It also p
Provides a GPU-accelerated library for manipulating and analyzing large tabular datasets.
Accelerates pandas, Polars, and Apache Spark DataFrame operations on NVIDIA GPUs with no code changes.
AliSQL is a fork of MySQL by Alibaba that extends the relational database management system with enhancements for high performance, scalability, and enterprise-grade availability. It retains the core MySQL identity as a SQL-based database for storing, organizing, and retrieving structured data, while adding optimizations for large-scale transactional and analytical workloads. The project differentiates itself through a set of Alibaba-specific improvements, including a columnar engine for accelerating analytical queries directly on MySQL tables, and a distributed, shared-nothing NDB Cluster en
Offloads analytical queries to a columnar engine for faster execution than the standard row-based engine.
Pigsty is a full-stack orchestration suite for deploying, monitoring, and managing high-availability PostgreSQL clusters and their supporting infrastructure. It functions as a cluster management platform and high-availability suite that automates failover, manages virtual IPs, and ensures data consistency through distributed consensus. The project distinguishes itself by providing a comprehensive database infrastructure-as-code framework and a dedicated observability stack. It incorporates a backup and recovery manager supporting point-in-time recovery via S3-compatible object storage, alongs
Accelerates OLAP queries through columnar storage, distributed processing, and GPU acceleration.
Jetson Containers 是一个容器管理系统,用于在 ARM64 边缘硬件上构建和运行用于机器学习工作负载的 GPU 加速 Docker 镜像。它作为一个 CUDA 容器编排器,在运行时自动检测主机的 CUDA 工具包版本和 GPU 功能以确保容器兼容性,同时在启动时通过匹配主机的 JetPack 或 L4T 版本来选择正确的容器镜像。 该项目提供预配置的容器,用于执行针对边缘设备优化的量化大语言模型和检索增强生成管道,以及用于部署自主代理和多模态处理的集成 ROS 和 AI 框架容器。其模块化分层构建系统从可重用的预构建层组装 Docker 镜像,从源码编译 AI/ML 框架以针对特定的边缘 GPU 架构和 CUDA 版本进行优化,并使用本地 wheel 缓存来加速后续构建。 该平台提供带有 GPU 加速版 PyTorch、TensorFlow、JAX 和 ONNX Runtime 的预构建 Docker 容器,支持在边缘硬件上运行 LLM、语音模型、视觉语言模型和神经机器翻译等功能。它还支持构建带有 GPU 加速 AI 包的自定义容器,运行 Triton Inference Server 和 Transformer Engine 容器,并使用 RAPIDS 库加速数据科学工作流。
Use a cuDF-based DataFrame library that runs on NVIDIA GPUs for accelerated data manipulation and analysis.