Smolvlm Realtime Webcam

Open-source alternatives to Smolvlm Realtime Webcam

Similar open-source projects, ranked by how many features they share with Smolvlm Realtime Webcam.

getstream/vision-agents
GetStream/Vision-Agents
6,029View on GitHub
Pythonagentic-aiagentsai
View on GitHub6,029
apple/ml-fastvlm
apple/ml-fastvlm
7,375View on GitHub
This project is a vision language model framework and vision-to-text pipeline designed for deploying and optimizing models that process both images and text. It provides an on-device inference engine and a vision language model framework to run quantized models locally on mobile and desktop hardware accelerators. The framework features a model quantization toolkit to reduce weight precision for lower memory footprints and increased execution speed on specialized silicon. It also includes an efficient vision encoder utilizing a hybrid encoding system to compress image tokens, which reduces pro
Python
View on GitHub7,375
google/gemma.cpp
google/gemma.cpp
6,735View on GitHub
gemma.cpp is a C++ inference engine for Gemma, PaliGemma, and Griffin language models, designed to run directly on-device without Python dependencies. It provides a self-contained runtime that loads quantized model weights and performs text generation on CPU or GPU, along with a model checkpoint converter that transforms PyTorch or Keras checkpoints into a compact binary format for fast loading. The engine supports multiple model architectures, including the Griffin recurrent architecture with gated linear recurrent layers and sliding-window attention for efficient long-sequence handling, as
C++
View on GitHub6,735
cbh123/narrator
cbh123/narrator
4,423View on GitHub
Narrator is an artificial intelligence system that converts real-time video feeds into natural language audio descriptions. It functions as a multimodal vision narrator and scene descriptor, using computer vision to transform environmental data from a camera into synthetic speech. The tool operates as a pipeline that captures periodic images from a feed and uses a multimodal large language model to analyze visual events. These analyses are then converted via text-to-speech synthesis into a voiceover that describes real-world activities and surroundings. The system supports automated environm
Python
View on GitHub4,423

See all 30 alternatives to Smolvlm Realtime Webcam

ngxsonsmolvlm-realtime-webcam

Features

Open-source alternatives to Smolvlm Realtime Webcam

GetStream/Vision-Agents

apple/ml-fastvlm

google/gemma.cpp

cbh123/narrator

Star history

Open-source alternatives to Smolvlm Realtime Webcam

GetStream/Vision-Agents

apple/ml-fastvlm

google/gemma.cpp

cbh123/narrator