3 repositorios
Performing large-scale data manipulation and analysis tasks on GPU hardware for increased processing speed.
Distinct from GPU Acceleration: The candidates focus on process analysis, communication, or streaming, not general dataframe-style analysis.
Explore 3 awesome GitHub repositories matching data & databases · GPU-Accelerated Data Analysis. Refine with filters or upvote what's useful.
cuDF is a GPU-accelerated dataframe library and data processing engine designed for manipulating and analyzing large tabular datasets. It provides a high-level API for executing filtering, joining, and aggregating operations directly on GPU hardware. The project integrates the Apache Arrow memory format to enable zero-copy data transfers and includes a just-in-time compiler for executing custom user-defined functions on the GPU. The library features specialized acceleration for existing workflows by redirecting standard Pandas dataframe calls and Polars query plans to a GPU backend. It also p
Provides a high-level API for executing large-scale tabular filtering, joining, and aggregation directly on GPU hardware.
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Accelerates large-scale data science workloads using GPU-to-GPU communication and shuffle operations.
Stumpy es una librería de Python para análisis de series temporales escalable centrada en la implementación de algoritmos de perfil de matriz (matrix profile). Proporciona un framework para calcular perfiles de distancia para identificar patrones repetitivos y anomalías dentro de datos de series temporales. El proyecto se distingue por su capacidad para escalar cálculos pesados a través de hardware GPU y clusters distribuidos utilizando Dask. Admite análisis multidimensional para descubrir motivos a través de flujos de datos concurrentes y ofrece computación incremental para análisis de streaming en tiempo real. La librería cubre una amplia gama de técnicas de minería de series temporales, incluyendo descubrimiento de motivos, detección de anomalías y coincidencia de patrones de secuencia. También proporciona herramientas para segmentación semántica para detectar cambios de régimen y la extracción de cadenas ordenadas temporalmente de patrones de subsecuencia similares.
Offloads complex matrix calculations to GPU hardware to significantly reduce processing time for large datasets.