6 repos
Explore 6 awesome GitHub repositories matching artificial intelligence & ml · Inference Optimization and Tuning. Refine with filters or upvote what's useful.
DeepSeek-V3 is a large language model that provides comprehensive resources for model utilization, including technical specifications, pre-trained weights, and evaluation benchmarks. The project details the core transformer architecture, including parameter counts and multi-token prediction modules, while supporting na
Optimized execution paths leverage specialized hardware accelerators to support efficient half-precision inference.
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows
Adjusts operational behavior and inference parameters for Llama models to optimize their performance in web-based reasoning tasks.
Hoppscotch is an open-source API development ecosystem designed for building, testing, and debugging REST, GraphQL, and real-time APIs. It provides a unified platform that functions across web browsers, desktop applications, and command-line interfaces, allowing developers to manage the entire API lifecycle from a sing
Configures AI-driven assistance to generate payloads and automate test script creation.
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Provides granular controls for adjusting inference parameters, hardware acceleration settings, and model-specific execution behaviors.
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as we
Implements efficient attention mechanisms and optimization strategies to maximize inference throughput.
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Demonstrates essential setup procedures for connecting to and configuring external language model providers.