sam-hq is a collection of pre-trained vision foundation models and adapters designed for high-quality image segmentation, multimodal feature extraction, and depth estimation. It provides a zero-shot vision model capable of performing segmentation and classification across diverse domains without requiring task-specific training. The project features a high-quality image segmentation tool based on the Segment Anything Model that generates precise masks from spatial prompts. It includes a multimodal feature extractor to generate high-dimensional vector embeddings from both image and text inputs
Gluon-CV is an MXNet computer vision library that provides a comprehensive collection of pre-implemented vision architectures and training pipelines. It serves as a deep learning research toolkit and a model zoo containing state-of-the-art pre-trained weights for image and video analysis. The project includes a specialized human pose estimation library and a model compression toolkit. These tools allow for the pruning and quantization of deep learning models to increase inference speed and facilitate deployment on constrained edge hardware. The library covers a broad range of vision capabili
This project is a computer vision system for object segmentation and tracking across images and videos. It employs models capable of identifying and masking objects using text prompts, bounding boxes, click points, or image exemplars. The system differentiates itself through memory-based video tracking and shared-memory architectures that maintain consistent object identities over time. It supports multi-object processing in single computation passes to increase frame throughput and utilizes iterative refinement to correct segmentation boundaries through sequential prompts. The software also
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti