30 open-source projects similar to byteflow-ai/tokenflow, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best TokenFlow alternative.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
CVPR 2025 Oral Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
CVPR 2025 Highlight🔥 Identity-Preserving Text-to-Video Generation by Frequency Decomposition
CVPR 2025 Highlight X-Dyna: Expressive Dynamic Human Image Animation
A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.
CVPR2025 We present SleeperMark, a novel framework designed to embed resilient watermarks into T2I diffusion models
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
Janus is a multimodal large language model and unified framework that integrates visual understanding and image generation within a single neural network. It functions as both a visual understanding model for analyzing images and a text-to-image generator. The system uses a unified transformer backbone and a multimodal latent space to bridge the gap between text and visual data. This architecture employs decoupled visual encoding and cross-modal tokenization to separate the paths for discriminative understanding and generative tasks, representing images as grids of discrete codes. The projec
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
CVPR 2025 Consistent and Controllable Image Animation with Motion Diffusion Models
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
This project is a computer vision benchmark and image classification dataset used to measure and compare the accuracy of machine learning models. It provides a standardized collection of labeled fashion product images and training data formatted to be compatible with the MNIST dataset structure. The dataset consists of fixed-dimension grayscale images and label-based category mappings, stored in a binary format. It includes pre-split training and testing sets and a static distribution to ensure consistent cross-model benchmarking. The repository supports image classification benchmarking and
记录每天整理的计算机视觉/深度学习/机器学习相关方向的论文
Attention-based Deep Multiple Instance Learning
Project page for End-to-end Recovery of Human Shape and Pose
Open source release of the evaluation benchmark suite described in "Realistic Evaluation of Deep Semi-Supervised Learning Algorithms"
This is the implementation of Hierarchical Long-term Video Prediction without Supervision, to be published in ICML 2018.
The project is an official implement of our CVPR2018 paper "Deep Back-Projection Networks for Super-Resolution" (Winner of NTIRE2018 and PIRM2018)
Maxim Berman, Amal Rannen Triki, Matthew B. Blaschko
Code for our CVPR 2018 paper: "Synthesizing Images of Humans in Unseen Poses"
Angjoo Kanazawa \ , Shubham Tulsiani \ , Alexei A. Efros, Jitendra Malik