30 open-source projects similar to alievk/avatarify-python, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Avatarify Python alternative.
EasyVtuber is 2D avatar animation software that transforms a single static image into a real-time animated character. It functions as a face tracking animation tool and live streaming avatar driver, mapping facial movements from webcams or iOS devices to drive virtual expressions and head motion. The project distinguishes itself through a neural animation pipeline that includes AI video upscaling and frame interpolation to increase visual smoothness and resolution. It utilizes a transparent video streaming system via Spout2, allowing rendered frames with alpha channels to be sent directly to
This project is a macOS system camera driver and software plugin that exposes software video streams as hardware-recognized camera inputs. It functions as an OBS virtual camera plugin, allowing the live output of OBS to be utilized as a webcam device within other applications. The tool enables the routing of composited video from a production suite into video conferencing applications such as Zoom or Google Meet. This allows for the streaming of processed scenes instead of a raw webcam feed. The system integrates with macOS using a kernel-level device driver and shared-memory buffer transfer
DeepFaceLive is a desktop application designed for real-time facial replacement and animation within live video streams. By utilizing deep learning models, the software performs high-speed identity mapping and facial feature analysis to transform video content as it is captured. The engine relies on GPU-accelerated inference to execute these complex image manipulation tasks at interactive frame rates. The application distinguishes itself through a modular video processing pipeline that chains specialized tasks to maintain high throughput and low latency. It features a virtual camera streaming
v4l2loopback is a Linux kernel video driver that creates virtual video devices to route video streams between applications. It functions as a software-defined video source, simulating physical hardware to provide a standard video input for applications that require a capture device. The project enables video stream routing by piping data from one process to another using the Video4Linux2 standard. It includes mechanisms for device capability masking and conditional reporting to bypass strict hardware detection requirements in external software. The driver provides tools for virtual camera si
This is a TensorFlow implementation of the Deep Convolutional Generative Adversarial Network (DCGAN) architecture, providing a framework for training generative models that produce synthetic images from random noise vectors. The project implements the core DCGAN design, using transposed convolutions for upsampling, batch normalization for training stability, and leaky ReLU activations in the discriminator, all executed as static TensorFlow computation graphs. The implementation supports training on custom image datasets by accepting user-supplied image folders without requiring a predefined f
This is a PyTorch-based computer vision library for detecting 2D and 3D facial landmark coordinates. It functions as a facial landmark detector and reconstruction tool, utilizing deep learning to identify precise geometric points on human faces from image datasets. The library allows for the selection of specific detection backends to balance accuracy and processing speed. It supports the integration of precomputed bounding box files, which enables the system to bypass the initial detection phase and proceed directly to landmark extraction. The toolkit includes capabilities for batch image p
SadTalker is an audio-driven talking head generator that produces synchronized speaking videos from a single source image and an input audio file. The system utilizes a deep learning framework to map speech signals to facial motion data, enabling the creation of lifelike digital avatars and animated characters. The project distinguishes itself by employing a three-dimensional morphable model to translate audio features into precise facial landmarks and head pose parameters. It integrates latent diffusion motion synthesis to generate naturalistic head movements and uses expression-aware textur
PRNet is a Python library for 3D facial reconstruction. It uses a deep learning regression model to predict 3D facial geometry and vertex colors from a single 2D input image to generate a textured mesh. The project provides tools for digital face swapping, allowing the replacement of a target face with a new image and blending textures to match the original pose. It also includes a framework for face texture swapping and blending to fit specific 3D poses. Additional capabilities cover facial analysis, including the detection and alignment of facial landmarks and the estimation of head pose a
Pose-animator is a system that maps real-time body and face tracking data to 2D vector illustrations. It functions as a skeletal animation engine and motion controller that translates human keypoint recognition into instantaneous SVG path updates. The project enables real-time motion capture from webcam feeds and pose extraction from static images. It utilizes a skeletal rig to link virtual bones to vector character surfaces, allowing for the animation of custom characters and interactive avatars. The tool incorporates client-side machine learning inference for processing camera frames, coor
This project is a PyTorch-based computer vision library and deep learning image processing framework. It provides a collection of neural network architectures designed for visual analysis tasks, specifically focusing on image classification, object detection, and semantic segmentation. The toolset implements diverse methodologies for visual recognition, including anchor-free object detection, regional proposal networks, and heatmap-based keypoint estimation. It utilizes both convolutional neural networks for spatial feature extraction and transformer-based self-attention mechanisms to compute
This is a PyTorch deep learning implementation for training transformer-based language models. It functions as a distributed GPU trainer and framework designed to optimize text prediction models for increased speed and sample efficiency. The project is distinguished by its use of the Newton-Schulz weight optimizer. This method applies an iterative process to maintain semi-orthogonal parameter updates and weight matrices, which improves sample efficiency and reduces memory overhead during the training process. The framework covers broad capabilities in distributed GPU computing, including dat
LiveKit is a comprehensive framework for building and orchestrating real-time, multimodal AI agents that interact with users through voice, video, and text. It provides a centralized, event-driven architecture to manage the entire lifecycle of automated participants, from initialization and session state management to graceful shutdown. By utilizing a selective forwarding unit, the platform efficiently routes media streams between participants and agents, ensuring low-latency communication and secure, token-based authentication for all connections. The platform distinguishes itself through it
LivePortrait is a computer vision framework designed for portrait animation and generative video synthesis. It functions as a deep learning system that transfers facial expressions and head movements from a driving video source onto a static image or an existing portrait video, effectively decoupling the subject's identity from the dynamic motion patterns. The framework utilizes keypoint-based motion retargeting and implicit 3D latent representations to map movements across different subjects, including both human and animal portraits. By employing canonical motion normalization and feature-s
This project is a PyTorch implementation of AnimeGANv2, a generative adversarial network and image-to-image translation model designed to transform real-world photographs into stylized anime imagery. The repository includes a model weight converter that enables the translation of checkpoints across different runtime environments. This utility performs weight key remapping and tensor dimension permutation to ensure compatibility between framework implementations. The system supports AI photo stylization through pre-trained weight loading and provides configurable upsampling alignment to maint
Pigo is a computer vision library written in Go for locating human faces in images and video streams. It provides tools for face detection, facial landmark identification, and pupil and eye localization. The project is implemented in pure Go to ensure portable execution without external dependencies. It supports compilation to WebAssembly, enabling face detection and image processing to run directly in web browsers without a backend. The library's capabilities include real-time face detection using classifier cascades and gaze tracking localization. It maps anatomical points on the face to a
This project is a PyTorch implementation of a text-to-image transformer. It is a generative AI model designed to map discrete text tokens to image pixels using a transformer network to create visual content from textual descriptions. The system utilizes a discrete VAE image encoder to compress visual data into tokens for transformer processing. It supports classifier-free guidance to adjust the influence of text prompts during inference and includes capabilities for ranking generated images based on their similarity to text prompts. The architecture incorporates sparse attention mechanisms a
This project is an educational course and collection of training materials focused on generative diffusion models. It provides a curriculum and practical guides for training, fine-tuning, and deploying models capable of synthesizing images, audio, and video. The material covers specific implementation strategies including noise-based synthesis, iterative refinement, and latent space compression. It provides instruction on guiding generative outputs through conditional synthesis and prompt adherence optimization, as well as techniques for image inpainting and text-based editing. The project i
Duix-Mobile is a software development kit for deploying real-time conversational AI characters on mobile devices. It enables the creation of interactive digital humans capable of fluid voice-to-voice interactions, featuring low-latency speech recognition and synchronized lip movements. The project distinguishes itself through the ability to integrate custom external language models and speech providers to define an avatar's intelligence and voice. It supports the generation of real-time multilingual subtitles and provides mechanisms to track the training status of newly created digital charac
This project is a PyTorch implementation of an attention-based neural network designed for sequence-to-sequence deep learning tasks. It serves as a library for constructing deep learning sequence models that utilize encoder and decoder structures to process natural language and sequential data. The implementation centers on a multi-head attention mechanism to capture diverse relationships between tokens without using recurrence. It includes sinusoidal positional encoding to maintain sequence order and point-wise feed-forward networks to transform token positions independently. The architectu
This library provides a deep learning framework for identifying human faces and extracting facial landmarks within digital images. It utilizes a multi-task convolutional neural network architecture to simultaneously perform face classification, bounding box regression, and landmark localization. The system processes images through three sequential stages of neural networks, incorporating image pyramid resizing to detect faces of varying scales. To ensure accuracy, it employs bounding box regression to refine coordinate predictions and non-maximum suppression to filter out redundant overlappin
clmtrackr is a JavaScript computer vision library designed for facial landmark detection and real-time tracking. It implements Constrained Local Models to identify specific coordinate points on a human face within video feeds or static images. The project functions as a real-time face warping engine and expression analysis tool. It can distort facial images via parametric models to create caricatures or identify and label emotional states such as happiness, sadness, anger, and surprise based on feature coordinates. The library covers a broad range of capabilities including automatic and manu
This project is a deep learning educational resource consisting of PyTorch model implementations and code examples. It provides functional Python scripts and notebooks for building, training, and optimizing neural networks using tensor-based computation. The repository includes implementations for designing custom network layers and loss functions, as well as examples of transfer learning workflows that load pretrained model weights to accelerate development. The codebase covers a broad range of deep learning capabilities, including neural network training, custom model component design, and
The official PyTorch implementation of Google's Gemma models
EchoMimic is an audio-driven portrait animation framework and latent diffusion video generator. It transforms static reference images into dynamic talking head videos by synchronizing facial movements with audio tracks and motion drivers. The system functions as a hybrid motion synthesis engine that combines audio inputs and pose data. It utilizes a facial landmark motion controller to edit positioning markers, enabling precise synchronization and video-to-video pose transfer. The pipeline covers image-to-video animation through latent diffusion and facial landmark conditioning. This allows
This project is a plugin for Photoshop that integrates Stable Diffusion backends, allowing users to generate and edit AI images directly within the graphic design workspace. It serves as an interface bridge between the image editor and remote GPU workers to perform generative tasks without requiring local hardware power. The plugin specifically provides connection layers for Automatic1111 and ComfyUI backends. This enables the execution of text-to-image generation, inpainting, and outpainting operations on the design canvas by communicating with these external engines via an API. The system
This project is a PyTorch implementation of the YOLOv3 object detection architecture. It functions as a real-time object detector and computer vision framework designed to identify and locate multiple objects within images using bounding boxes and class labels. The system allows for both the use of pretrained weights for immediate image analysis and the training of custom models using datasets with bounding box annotations. It provides a programmatic interface to integrate detection capabilities directly into other software applications. The framework includes tools for model evaluation to m
FaceNet is a facial recognition framework designed to transform facial images into high-dimensional numerical embeddings for identity verification and recognition. It provides a deep learning face embedder that maps facial features into a Euclidean space where distance corresponds to facial similarity. The system includes tools for both supervised and unsupervised identity management. It features a face identity classifier for categorizing images into known identity classes and an unsupervised clustering tool to group similar facial embeddings together without predefined labels. The framewor
This project is a PyTorch library for building and training Kolmogorov-Arnold Networks. It implements a neural network architecture that replaces fixed activation functions with learnable spline-based functions on edges, serving as a tool for interpretable machine learning. The implementation utilizes reformulated matrix operations to reduce memory overhead and increase computation speed. It employs L1 regularization to sparsify network weights, which improves the transparency of the model's internal logic and decisions. The framework covers a range of capabilities including grid-based funct
Code release for ConvNeXt model
This project is a deep learning research toolkit and generative model library providing implementations of Variational Autoencoders using the PyTorch framework. It serves as a framework for training and evaluating autoencoder architectures to learn latent representations for data reconstruction and the generation of synthetic data samples. The toolkit focuses on unsupervised feature learning and generative model training, featuring a system for mapping external configuration files to model hyperparameters to ensure reproducible experimental runs. It includes mechanisms for tracking training p