9 repos

Awesome GitHub RepositoriesLocal and On-Device Inference

Explore 9 awesome GitHub repositories matching artificial intelligence & ml · Local and On-Device Inference. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

ggml-org/llama.cpp
ggml-org/llama.cpp
95,400GitHubView on GitHub
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU archi
Terminal-based utilities allow for direct interaction with models, including configuration of inference parameters and chat management.
C++ggml
nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Enables private, offline inference by running large language models directly on local hardware resources.
C++ai-chatllm-inference
zed-industries/zed
zed-industries/zed
75,634GitHubView on GitHub
Zed is an AI-native, high-performance code editor designed for extreme responsiveness and keyboard-centric workflows. It functions as an extensible text processing workspace that integrates autonomous agents and predictive models directly into the development environment to automate complex engineering tasks, refactori
Runs machine learning models on local hardware to ensure data privacy and reduce latency for AI-assisted coding tasks.
Rustgpuirust-langtext-editor
infiniflow/ragflow
infiniflow/ragflow
73,425GitHubView on GitHub
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Configures local inference engines and external model providers through a unified interface for seamless deployment.
Pythonagentagenticagentic-ai
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Enables execution of advanced generative models directly on local hardware for private and low-latency inference.
Pythonamdblackwellcuda
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Hosts models locally to serve low-latency predictions through standard network APIs.
Pythonagentaideepseek
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Runs generative models directly on consumer hardware to maintain data privacy and eliminate dependency on cloud services.
Python
zylon-ai/private-gpt
zylon-ai/private-gpt
57,116GitHubView on GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
Runs generative language models directly on local hardware for private, offline processing tasks.
Python
ultralytics/ultralytics
ultralytics/ultralytics
53,426GitHubView on GitHub
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification
Optimizes model weights and architectures for efficient inference on low-power embedded hardware.
Pythonclicomputer-visiondeep-learning

Explore sub-tags

9 repos

Awesome GitHub RepositoriesLocal and On-Device Inference

Explore 9 awesome GitHub repositories matching artificial intelligence & ml · Local and On-Device Inference. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

ggml-org/llama.cpp
ggml-org/llama.cpp
95,400GitHubView on GitHub
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU archi
Terminal-based utilities allow for direct interaction with models, including configuration of inference parameters and chat management.
C++ggml
nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Enables private, offline inference by running large language models directly on local hardware resources.
C++ai-chatllm-inference
zed-industries/zed
zed-industries/zed
75,634GitHubView on GitHub
Zed is an AI-native, high-performance code editor designed for extreme responsiveness and keyboard-centric workflows. It functions as an extensible text processing workspace that integrates autonomous agents and predictive models directly into the development environment to automate complex engineering tasks, refactori
Runs machine learning models on local hardware to ensure data privacy and reduce latency for AI-assisted coding tasks.
Rustgpuirust-langtext-editor
infiniflow/ragflow
infiniflow/ragflow
73,425GitHubView on GitHub
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Configures local inference engines and external model providers through a unified interface for seamless deployment.
Pythonagentagenticagentic-ai
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Enables execution of advanced generative models directly on local hardware for private and low-latency inference.
Pythonamdblackwellcuda
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Hosts models locally to serve low-latency predictions through standard network APIs.
Pythonagentaideepseek
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Runs generative models directly on consumer hardware to maintain data privacy and eliminate dependency on cloud services.
Python
zylon-ai/private-gpt
zylon-ai/private-gpt
57,116GitHubView on GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
Runs generative language models directly on local hardware for private, offline processing tasks.
Python
ultralytics/ultralytics
ultralytics/ultralytics
53,426GitHubView on GitHub
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification
Optimizes model weights and architectures for efficient inference on low-power embedded hardware.
Pythonclicomputer-visiondeep-learning

Awesome Local and On-Device Inference GitHub Repositories

ggml-org/llama.cpp

nomic-ai/gpt4all

zed-industries/zed

infiniflow/ragflow

vllm-project/vllm

hiyouga/LlamaFactory

meta-llama/llama

zylon-ai/private-gpt

ultralytics/ultralytics

Explore sub-tags

Awesome Local and On-Device Inference GitHub Repositories

ggml-org/llama.cpp

nomic-ai/gpt4all

zed-industries/zed

infiniflow/ragflow

vllm-project/vllm

hiyouga/LlamaFactory

meta-llama/llama

zylon-ai/private-gpt

ultralytics/ultralytics

Explore sub-tags