Sam Hq

Sam Hq - segment and classify images zero-shot | Awesome Repos

Features

Query-Based Mask Generators - Generates precise binary segmentation masks based on spatial points or bounding box queries.
SAM-Based Implementations - Implements a high-quality image segmentation tool based on the Segment Anything Model architecture.
High-Precision Mask Generators - Produces high-quality, precise binary segmentation masks for intricate objects using spatial prompts.
Depth Estimation - Provides a convolutional head for predicting distance or canopy height from visual data.
Multimodal Feature Extractors - Generates unified high-dimensional vector representations from both image and text inputs.
Multi-Modal Embedding Models - Provides a multimodal embedding model that maps image and text data into a shared vector space.
Vision Transformer Encoders - Uses vision transformer encoders to extract high-dimensional visual features from image patches.
Zero-Shot Generalization - Enables the model to segment objects in unseen images without requiring task-specific training.
Zero-Shot Segmentations - Creates fine-grained masks for objects based on user prompts without task-specific training.
Zero-Shot Vision Foundation Models - Provides a pre-trained foundation model for zero-shot segmentation and classification across diverse domains.
Depth Estimation - Predicts spatial distance or canopy height from visual data using a specialized depth estimation head.
Transformer Embedding Extraction - Extracts raw hidden-state embeddings from transformer internals to serve as a foundation for vision tasks.
High-Resolution Feature Extraction - Generates high-dimensional vector representations from images at various resolutions.
Multi-Scale Feature Aggregation - Extracts image representations at multiple scales to capture both global context and local detail.
Domain Adaptation - Supports fine-tuning segmentation models on custom datasets for medical imaging and remote sensing.
Task-Specific Heads - Utilizes task-specific output heads to perform specialized functions like depth estimation and classification.
Zero-Shot Image Classifiers - Categorizes images by comparing visual features against text labels without task-specific training.
Domain-Specific Fine-Tuning - Provides a framework for fine-tuning segmentation masks for specialized applications like remote sensing.
Vision Model Fine-Tuning - Supports adapting pre-trained vision models to specialized domains like medical imaging and remote sensing through fine-tuning.

Open-source alternatives to Sam Hq

Similar open-source projects, ranked by how many features they share with Sam Hq.

google-research/big_vision
google-research/big_vision
3,363View on GitHub
This project is a research framework and toolkit designed for training large-scale vision transformers and multimodal language models. It provides a comprehensive suite for vision-language pretraining, enabling the development of models that map images and text into shared latent spaces. The framework is distinguished by its capabilities in high-fidelity image generation and multimodal research, utilizing normalizing flows and variational autoencoders to produce images from text prompts or class labels. It supports the development of both generative and contrastive models, allowing for a wide
Jupyter Notebook
View on GitHub3,363
facebookresearch/dinov3
facebookresearch/dinov3
9,613View on GitHub
This project is a self-supervised vision foundation model based on a vision transformer architecture. It is designed to learn dense visual representations from unlabeled images, serving as a general-purpose backbone for a wide variety of downstream vision tasks. The system is distinguished by its use of self-distillation and masked image modeling to extract semantic and geometric features. It also incorporates an image-text alignment model that maps visual embeddings to textual descriptions, enabling zero-shot image recognition, zero-shot segmentation, and cross-modal retrieval. The project
Jupyter Notebook
View on GitHub9,613
xenova/transformers.js
xenova/transformers.js
16,141View on GitHub
Transformers.js is a JavaScript library and web machine learning framework designed to run pretrained transformer models directly in the browser. It serves as a client-side inference engine and a wrapper for the ONNX Runtime, enabling the execution of multimodal AI tasks on user devices without the need for a backend server. The library distinguishes itself by providing a unified toolkit for processing text, image, and audio data locally. This architecture supports privacy-preserving model inference and reduces latency by performing all computations on the client's hardware. Its capabilities
JavaScript
View on GitHub16,141
ux-decoder/segment-everything-everywhere-all-at-once
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
4,790View on GitHub
This project is a multi-modal image segmentation framework and a text-to-mask vision model. It serves as a SAM-based visual segmenter designed to isolate distinct objects within images and video by converting natural language prompts and other inputs into pixel-level semantic masks. The system functions as a multi-modal image segmentation framework that integrates text, image, and audio signals to generate masks. It includes an interactive video object tracker that isolates and tracks visual entities across video frames using referring images or textual queries. The framework provides capabi
Python
View on GitHub4,790

See all 30 alternatives to Sam Hq

SysCVsam-hq

Features

Open-source alternatives to Sam Hq

google-research/big_vision

facebookresearch/dinov3

xenova/transformers.js

UX-Decoder/Segment-Everything-Everywhere-All-At-Once

Star history

Open-source alternatives to Sam Hq

google-research/big_vision

facebookresearch/dinov3

xenova/transformers.js

UX-Decoder/Segment-Everything-Everywhere-All-At-Once