4 Repos
Python interfaces designed specifically for interacting with low-level large language model inference engines.
Distinct from Python Bindings: None of the generic Python binding candidates capture the AI-specific nature of this interface.
Explore 4 awesome GitHub repositories matching artificial intelligence & ml · LLM Python Bindings. Refine with filters or upvote what's useful.
llama-cpp-python provides a Python interface for the llama.cpp library, enabling the execution of large language models with hardware acceleration. It functions as a GGUF model loader and a structured text generator capable of running inference servers and multimodal runtimes for processing both text and image inputs. The project distinguishes itself through a local inference server that exposes model capabilities via an OpenAI-compatible web API. It supports advanced execution techniques including speculative decoding, weight quantization, and layer-based GPU offloading to manage memory acro
Provides the primary Python interface for the llama.cpp library to run hardware-accelerated models.
Translation Agent is a Python-based system that uses a large language model to translate text through a multi-step agentic workflow. Rather than producing a single output, it generates an initial translation, then prompts the same LLM to critique its own work and produce improvement suggestions, and finally refines the translation based on that self-critique. This reflection-driven iterative refinement loop is the core mechanism for improving translation quality without requiring human feedback or additional training data. The system distinguishes itself through two key capabilities. First, i
Ships a lightweight Python script that sequences stateless LLM calls and manages prompt templates.
ToolBench is an open platform for training, serving, and evaluating large language models that retrieve and call real-world APIs to complete user instructions. It provides an API-aware inference engine that selects relevant tools from a large corpus and generates sequences of tool calls to produce final answers, along with a custom API registration system that lets users add their own REST endpoints for the model to discover and invoke. The platform includes a complete instruction-tuning pipeline for training models on curated tool-use data, a multi-tool execution engine that coordinates sequ
Provides an API-aware inference engine that selects relevant tools from a large corpus and generates tool-calling sequences.
This is a Python SDK for interacting with large language models via API. It serves as a client library to generate text, process messages, and manage conversational states, while providing a specialized interface for connecting to models hosted across different cloud infrastructure providers. The SDK includes a tool-calling framework that maps Python functions to JSON schemas, allowing models to execute external tools. It also features a built-in token counting utility to estimate input size before transmission and a server-sent events client for receiving model tokens in real time. The libr
Serves as a comprehensive Python client library for interacting with large language models via API.