5 repository-uri
SDK integrations that allow loading and running models in-process from Python applications.
Distinct from Backend-as-a-Service Integrations: No candidate covers embedding an LLM inference engine directly into a Python application; closest candidates are general SDK or BaaS integrations.
Explore 5 awesome GitHub repositories matching artificial intelligence & ml · Python SDK Embeddings. Refine with filters or upvote what's useful.
Microsandbox is a runtime for creating and managing lightweight, hardware-isolated virtual machines — called sandboxes — that boot directly from standard OCI container images. Each sandbox runs as its own host process with a separate kernel, filesystem, and network stack, providing process-per-sandbox isolation. The project includes a command-line tool and multi-language SDKs (Rust, TypeScript, Python, Go) for programmatic lifecycle control, and it communicates with sandbox agents over Unix sockets using a CBOR-encoded protocol. What distinguishes Microsandbox is its combination of host-manag
Provides native SDKs wrapping the agent protocol into typed APIs for Rust, TypeScript, Python, and Go.
mistral.rs is an inference engine for large language models that runs locally and exposes models behind OpenAI and Anthropic-compatible APIs. It serves as a multi-model serving platform, capable of loading several models in a single server process with per-request routing and on-demand loading and unloading. The engine supports multimodal inference, processing text alongside images, video, audio, and speech inputs, and includes a quantized model deployment runtime that reduces memory use and speeds up inference on consumer hardware. The project distinguishes itself through an agentic tool exe
Embeds the inference engine directly into Python applications via a Runner class.
Airweave is a unified AI knowledge base platform that syncs data from external APIs into a searchable layer for retrieval-augmented generation. It provides a pre-built data connector library and a framework for building custom connectors, enabling the extraction, transformation, and synchronization of structured and unstructured data from SaaS applications. The platform includes a hybrid vector retrieval system that combines semantic, neural, and keyword search strategies to deliver grounded context for AI agents. The platform distinguishes itself through an agentic search engine that iterati
Searches collections and retrieves grounded context programmatically via client libraries.
PaddleX is a PaddlePaddle-based framework for building, deploying, and fine-tuning AI model pipelines, with pre-built support for computer vision, OCR, document analysis, and time series tasks. It offers a toolkit of ready-to-use pipelines for image classification, object detection, segmentation, and pose estimation, alongside an end-to-end OCR document analysis pipeline that extracts text, tables, formulas, and layout information. The platform also includes a dedicated time series forecasting pipeline for analyzing historical data to detect anomalies, classify patterns, and predict future val
Provides a Python SDK to load and run an image classification pipeline programmatically.
This project is a headless large language model inference engine and server manager designed for local deployments. It provides a developer toolkit and API gateway that allows for the management of model lifecycles and inference tasks without a graphical user interface. The system enables the deployment of model engines across different operating systems, cloud environments, or CI pipelines. It includes a command-line interface for bootstrapping development projects and automating the orchestration of loading and unloading model binaries based on specific workflow needs. The toolset covers i
Provides JavaScript and Python SDKs that wrap the HTTP API for building custom applications against the local inference engine.