# ailab-cvc/yolo-world

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/ailab-cvc-yolo-world).**

6,425 stars · 608 forks · Python · GPL-3.0

## Links

- GitHub: https://github.com/AILab-CVC/YOLO-World
- Homepage: https://www.yoloworld.cc
- awesome-repositories: https://awesome-repositories.com/repository/ailab-cvc-yolo-world.md

## Description

YOLO-World is a vision-language framework and open-vocabulary object detection model. It identifies objects in images and video based on free-form text prompts without requiring predefined category labels.

The system enables the identification of arbitrary objects by fusing image features with text embeddings. It includes a specialized tool for automated image labeling, which generates bounding box annotations for custom datasets using text-based prompts.

The project provides a deployment pipeline for converting models into quantized ONNX and TFLite formats, supporting real-time inference on resource-constrained edge hardware. It also includes a fine-tuning adaptation framework to adapt pre-trained models to custom domains through prompt or reparameterized tuning.

## Tags

### Artificial Intelligence & ML

- [Open-Vocabulary Object Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/open-vocabulary-object-detection.md) — Implements an open-vocabulary object detection model that identifies arbitrary objects using free-form text prompts.
- [Open-Vocabulary Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/zero-shot-inference/open-vocabulary-detection.md) — Implements an open-vocabulary detection pipeline that identifies arbitrary objects using text embeddings instead of fixed labels.
- [Vision-Language Cross-Attention Fusions](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-architectures/cross-attention-mechanisms/vision-language-cross-attention-fusions.md) — Fuses visual features with text embeddings through cross-attention mechanisms to enable open-vocabulary object recognition.
- [Vision-Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/multimodal-processing-tools/vision-language-models.md) — Utilizes a vision-language model architecture that fuses image features with text embeddings for object recognition.
- [Vision-Language Fine-Tunings](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-frameworks/vision-model-training/vision-language-training/vision-language-fine-tunings.md) — Adapts pre-trained vision-language models to custom domains using specialized fine-tuning methods.
- [YOLO Object Detectors](https://awesome-repositories.com/f/artificial-intelligence-ml/video-object-tracking/yolo-object-detectors.md) — Employs a real-time object detection system based on the YOLO architecture optimized for low-latency inference.
- [2D Object Labeling](https://awesome-repositories.com/f/artificial-intelligence-ml/annotation-tools/2d-object-labeling.md) — Ships a tool that automatically generates 2D bounding box annotations using text-based prompts. ([source](https://www.yoloworld.cc))
- [Structural Reparameterizations](https://awesome-repositories.com/f/artificial-intelligence-ml/backpropagation/reparameterization-trick/structural-reparameterizations.md) — Utilizes structural reparameterization to adapt pre-trained models to custom domains without sacrificing inference speed.
- [Edge Object Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-detection-tracking/edge-object-detection.md) — Provides object detection and tracking optimized for deployment on resource-constrained edge hardware. ([source](https://www.yoloworld.cc))
- [Edge AI Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/edge-ai-runtimes.md) — Provides a runtime optimized for executing detection and tracking on personal and edge devices.
- [Image Inference Clients](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-clients/on-device-inference/image-inference-clients.md) — Processes single images, directories, or video files to detect objects described by text prompts. ([source](https://cdn.jsdelivr.net/gh/ailab-cvc/yolo-world@master/README.md))
- [Edge AI Model Deployment](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment.md) — Optimizes and deploys real-time object detection to run efficiently on local hardware and edge devices.
- [ONNX Model Exporters](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/serialization-and-export-formats/onnx-model-exporters.md) — Provides utilities to export the detection model into the standardized ONNX format for cross-platform deployment.
- [TFLite Model Exporters](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/serialization-and-export-formats/tflite-model-exporters.md) — Converts trained detectors into ONNX and TFLite formats for deployment on servers and edge devices. ([source](https://cdn.jsdelivr.net/gh/ailab-cvc/yolo-world@master/README.md))
- [Automated Image Labeling](https://awesome-repositories.com/f/artificial-intelligence-ml/model-predictions/prediction-engines/image-labeling-engines/automated-image-labeling.md) — Generates bounding box annotations for vision datasets using text descriptions to automate image labeling.
- [Fine-Tuning Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-frameworks/vision-model-training/vision-language-training/vision-language-fine-tunings/fine-tuning-frameworks.md) — Offers a framework supporting normal, prompt, and reparameterized fine-tuning to adapt models to custom domains.
- [Conversion-Time Quantizers](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization/quantized-model-implementations/on-load-quantizers/quantized-model-exporters/conversion-time-quantizers.md) — Implements quantization during the model conversion process to shrink weights to 8-bit integers for edge inference.

### DevOps & Infrastructure

- [ONNX and TFLite Model Exporters](https://awesome-repositories.com/f/devops-infrastructure/deployment-management/model-export-formats/onnx-and-tflite-model-exporters.md) — Provides a deployment pipeline to convert detection models into quantized ONNX and TFLite formats for edge hardware.
- [TFLite Exports](https://awesome-repositories.com/f/devops-infrastructure/deployment-management/model-export-formats/tflite-exports.md) — Converts models to TFLite format using INT8 quantization for efficient mobile deployment. ([source](https://github.com/AILab-CVC/YOLO-World/blob/master/docs/tflite_deploy.md))

### Graphics & Multimedia

- [Real-Time Video Analysis](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/video-transformation-enhancement/chunked-video-processing/video-processing-apis/video-input-processing/real-time-video-analysis.md) — Processes live video streams with low-latency pipelines for immediate object detection and tracking.
- [Real-Time Model Inference on Frames](https://awesome-repositories.com/f/graphics-multimedia/video-frame-processing/real-time-model-inference-on-frames.md) — Processes images and video frames through a streamlined pipeline optimized for real-time, low-latency performance.

### Part of an Awesome List

- [Computer Vision](https://awesome-repositories.com/f/awesome-lists/ai/computer-vision.md) — Real-time open-vocabulary object detection.
- [Object Detection](https://awesome-repositories.com/f/awesome-lists/more/object-detection.md) — Listed in the “Object Detection” section of the The Incredible Pytorch awesome list.