3 repos

Awesome GitHub RepositoriesLocal Inference Engines

Software frameworks that enable the execution of generative artificial intelligence models directly on local computing hardware.

Explore 3 awesome GitHub repositories matching artificial intelligence & ml · Local Inference Engines. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Pythonamdblackwellcuda
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Python
zylon-ai/private-gpt
zylon-ai/private-gpt
57,116GitHubView on GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
Python

3 repos

Software frameworks that enable the execution of generative artificial intelligence models directly on local computing hardware.

Explore 3 awesome GitHub repositories matching artificial intelligence & ml · Local Inference Engines. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Pythonamdblackwellcuda
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Python
zylon-ai/private-gpt
zylon-ai/private-gpt
57,116GitHubView on GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
Python

Awesome Local Inference Engines GitHub Repositories