30 open-source projects similar to libvips/libvips, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Libvips alternative.
Sharp is a high-performance image processing library for Node.js. It serves as a native extension and wrapper for the libvips framework, providing tools for image resizing, format conversion, and programmatic data manipulation. The project enables the transformation of images into web-friendly formats such as WebP and AVIF while preserving color profiles and alpha channels. It also provides capabilities for generating blank image buffers with specified dimensions and background colors. The library covers a broad range of image manipulation utilities, including rotation, extraction, compositi
Pillow is a Python image processing library and digital image manipulation toolkit used for opening, manipulating, and saving various image file formats. It serves as a multi-format image codec wrapper that enables the reading and writing of diverse standards such as JPEG, PNG, TIFF, and BMP. The library provides tools for programmatic image manipulation, including resizing, cropping, rotating, and transforming visual content through direct pixel data modification. It supports pixel data analysis to extract and modify raw information for custom visual processing and data transformations. The
imgaug is a Python library for machine learning data augmentation and computer vision dataset expansion. It provides tools to increase the volume and variety of training sets by applying random geometric, color, and noise transformations to images. The library ensures spatial consistency by synchronizing transformations across images and their associated annotations, such as bounding boxes, keypoints, and segmentation maps. It uses a compositional pipeline pattern to chain multiple augmentations into sequences and employs deterministic seed management to reproduce specific data samples. The
This project is a high-performance image transformation server and media optimization proxy designed to process, resize, and convert assets on the fly. It functions as a secure pipeline that fetches remote source files and applies transformations—such as cropping, watermarking, and visual filtering—directly through parameters defined in the request URL. The service distinguishes itself through a focus on secure, resource-aware delivery. It protects infrastructure by validating incoming requests with cryptographic signatures to prevent unauthorized access and enforces strict limits on file dim
Caire is a command-line image processing engine designed for content-aware resizing and batch manipulation. It utilizes seam carving algorithms to adjust image dimensions by identifying and removing low-energy pixels, allowing for the rescaling of images while preserving primary visual subjects and maintaining aspect ratios. The tool distinguishes itself through its ability to protect specific visual elements, such as human faces, from distortion during the resizing process. Users can apply custom binary masks to define regions for protection or forced removal, and the engine provides real-ti
Agent-S is a multimodal AI agent and LLM desktop automation framework designed to control operating systems through graphical user interface interactions. It functions as a computer use interface, utilizing vision-language grounding to translate natural language goals into precise screen coordinates and system actions. The project differentiates itself by combining structured accessibility tree inspection with vision-based element localization. It manages cross-application workflows by mapping conceptual descriptions to physical pixels and simulating low-level keyboard and mouse events to mov
ImageMagick is a comprehensive software suite for the creation, editing, composition, and conversion of digital images. It functions as both a command-line utility for batch processing and automation, and as a programming library that allows developers to integrate advanced image manipulation capabilities into external applications. The project is distinguished by its modular architecture, which supports hundreds of image formats through a pluggable coder system and external delegate libraries. It is designed for high-performance environments, utilizing memory-mapped pixel caching, stream-ori
imutils is a computer vision utility toolkit and image processing library designed to simplify common manipulation tasks using OpenCV. It serves as an image analysis helper and geometry transformation tool for automating visual data processing. The toolkit provides specialized capabilities for maintaining image integrity during transformations, such as resizing images while preserving aspect ratios and rotating images without cropping corners. It also includes tools for four-point perspective warping to create top-down views and the extraction of topological skeletons from binary images. The
scikit-image is a Python image processing library and scientific image analysis toolkit. It provides a framework for digital image processing and computer vision, utilizing numerical arrays for pixel-level manipulations. The library enables the quantification of image properties and the detection of visual features, such as edges and blobs. It includes tools for image segmentation and the extraction of textures and patterns to characterize objects within visual data. Capabilities cover image manipulation through color space conversion, geometric transformations, and digital restoration. It a
SAHI is a sliced inference framework and computer vision pipeline designed to detect small objects in high-resolution images. It provides a system for dividing large images into overlapping patches to prevent the detail loss that typically occurs during standard model downscaling, alongside an image tiling utility and a COCO dataset toolkit. The project distinguishes itself by offering a model-agnostic prediction wrapper that standardizes different machine learning frameworks into a unified interface. This allows it to implement sliced inference and object detection across various model backe
This project is a comprehensive computer vision library for the PyTorch ecosystem, providing a standardized collection of neural network architectures, datasets, and high-performance transformation utilities. It serves as a foundational framework for building, training, and deploying deep learning models, offering a centralized model registry that allows developers to instantiate architectures with pre-trained weights for tasks such as image classification, object detection, and semantic segmentation. The library distinguishes itself through its modular approach to data and compute management
Supervision is a computer vision toolset for normalizing model outputs, managing datasets, and visualizing annotations. It provides a framework to convert predictions from various classification and detection models into a standardized data format to ensure interoperability across different computer vision pipelines. The library features a post-processor for filtering, counting, and tracking detected objects across image frames and video streams. It includes capabilities for large image tiling to improve the detection of small objects and tools for assigning persistent identities to objects t
Augmentor is a Python image augmentation library and framework designed to expand machine learning datasets. It functions as a preprocessing tool that generates synthetic image variations to increase data diversity and as a training data streamer that feeds augmented images and labels directly into neural network loops without requiring intermediate disk storage. The framework maintains spatial alignment between images and their corresponding masks, which is required for semantic segmentation training. It supports various geometric and pixel-level transformations, including elastic distortion
MoviePy is a Python video editing library and automated video processor designed for programmatically cutting, concatenating, and manipulating video and audio files. It serves as a non-linear video editor and an interface for FFmpeg to handle the reading, writing, and conversion of diverse media formats and codecs. The library enables automated video composition through the layering of multiple video and audio streams using transparency and coordinate-based positioning. It supports dynamic content generation by inserting text overlays and performing custom video frame processing where raw fra
Computational geometry and spatial indexing on the sphere
EasyOCR is a deep learning-based computer vision library designed to perform optical character recognition on images and video frames. It functions as a comprehensive pipeline that automates the transformation of visual text into machine-readable strings, enabling the digitization of physical documents, forms, and receipts into searchable data. The engine distinguishes itself through a multi-stage processing workflow that combines convolutional neural networks for spatial feature extraction with sequence-based decoding mechanisms. This architecture allows the system to identify and interpret
TensorFlow Implementation for Computing a Semantically Segmented Bird's Eye View (BEV) Image Given the Images of Multiple Vehicle-Mounted Cameras.
Indexes points and lines and generates map tiles to display them
Yolact is a computer vision framework and real-time instance segmentation model. It utilizes a fully convolutional neural network to detect objects and generate pixel-level masks for images and video feeds. The system employs prototypical mask generation to create global mask prototypes that are linearly combined for instance-specific results. It incorporates deformable convolutional layers and deformable region-of-interest pooling to adapt spatial sampling to the irregular shapes of objects. The framework covers the full model development lifecycle, including training on custom datasets, ac
Terrain Analysis Using Digital Elevation Models (TauDEM) software for hydrologic terrain analysis and channel network extraction.
Entwine - point cloud organization for massive datasets