4 repository-uri
Methods for reducing the computational cost of convolutions by decomposing high-dimensional kernels.
Distinct from One-Dimensional Convolutions: Existing candidates focus on 1D convolutions or hardware caching, not mathematical kernel decomposition.
Explore 4 awesome GitHub repositories matching artificial intelligence & ml · Convolutional Kernel Optimizations. Refine with filters or upvote what's useful.
This project is a comprehensive Chinese translation of a technical deep learning textbook, providing an educational resource on the theory and implementation of neural networks. It functions as a collaborative technical translation project designed to make complex academic AI literature accessible to non-English speakers. The project utilizes a community-driven translation model that integrates external suggestions and pull requests to refine linguistic accuracy and reduce bias. It employs standardized terminology mapping to ensure a uniform vocabulary throughout the translated content. To i
Explains how to represent multi-dimensional kernels as series of 1D convolutions to reduce computation.
TileLang is a Python-embedded domain-specific language compiler that JIT-compiles and autotunes GPU kernels. It uses a tile-based DSL, automatic software pipelining, and parallel autotuning to generate optimized GPU kernels at runtime. It supports tensor core operations with Pythonic syntax, automatic memory management, and thread mapping. The compiler searches over tile sizes, thread counts, and scheduling policies, compiling and benchmarking candidates in parallel to find the fastest kernel. It also caches compiled binaries and tuning results to disk for reuse across sessions. TileLang inc
Defines matrix-matrix convolution computations with configurable dimensions, data types, and optional bias.
PlugNPlay-Modules is a collection of reusable PyTorch computer vision modules and deep learning architectural components. It provides a library of standardized building blocks for constructing neural networks, focusing on attention mechanisms, signal processing layers, and feature fusion modules. The project is distinguished by its extensive variety of attention primitives, covering spatial, channel, and temporal weighting, as well as specialized variants like deformable, frequency-enhanced, and linear-complexity attention. It also implements advanced signal processing tools within the neural
Provides Poly Kernel Inception Blocks that process features through parallel depthwise convolutions with varying dilations.
Acest proiect este o resursă educațională cuprinzătoare și un curs pentru construirea de rețele neuronale folosind PyTorch. Acoperă elementele fundamentale ale deep learning-ului, inclusiv manipularea tensorilor, diferențierea automată și construcția componentelor modulare de rețele neuronale. Repository-ul servește drept ghid tehnic pentru mai multe domenii specializate. Oferă detalii de implementare pentru sarcini de computer vision, cum ar fi clasificarea imaginilor, detecția obiectelor și segmentarea semantică, precum și fluxuri de lucru de procesare a limbajului natural (NLP) care implică transformatoare, rețele recurente și modele generative. În plus, include o referință pentru AI generativ, concentrându-se în mod specific pe sinteza de imagini prin modele de difuzie și rețele adversariale. Materialul se extinde către optimizarea modelelor și pipeline-uri de deployment. Acoperă tehnici pentru reducerea dimensiunii modelelor și creșterea vitezei de inferență prin cuantizare și exportul modelelor în formate precum ONNX și TensorRT. Alte domenii de capabilitate includ ingineria datelor pentru încărcarea paralelă, evaluarea modelelor folosind metrici personalizate și deployment-ul modelelor de limbaj mari (LLM) open-source. Proiectul este livrat în principal sub formă de serie de Jupyter Notebooks.
Renders weight tensors from convolutional layers as images to analyze learned patterns.