1 repo
Strategies for partitioning model weights across multiple processing units.
Distinguishing note: Focuses on weight partitioning for memory distribution rather than data parallelism.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Tensor Parallelism. Refine with filters or upvote what's useful.
ChatGLM-6B is a generative AI inference engine designed for local execution of transformer-based language models. It provides a comprehensive runtime environment that allows users to load and run pre-trained neural network weights directly on their own hardware, ensuring data privacy and independence from external cloud services. The project distinguishes itself through a hardware-agnostic execution backend that supports deployment across diverse environments, including standard processors, Apple Silicon, and multi-GPU configurations. It incorporates advanced optimization techniques such as w
Partitions large model weights across multiple graphics processing units to increase throughput during concurrent inference.