30 open-source projects similar to microsoft/trellis, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best TRELLIS alternative.
Threestudio is a 3D generative AI framework designed to create three-dimensional assets from text prompts and images. It provides specialized pipelines for text-to-3D generation and image-to-3D reconstruction, utilizing a neural radiance field trainer to produce geometry and textures. The framework is distinguished by its support for hybrid geometry backends, including signed distance functions, tetrahedra grids, and volume grids. It employs score distillation sampling to guide the generation process and features a modular plugin system for loading custom modules and nodes. The system covers
Point-e is a system for 3D model synthesis that generates three-dimensional point clouds from natural language descriptions and two-dimensional images. It utilizes diffusion models to synthesize these spatial representations based on text prompts or source images. The project includes specialized tools for refining these outputs, such as a point cloud upsampler to increase the density and resolution of low-resolution models. It also provides a mesh converter that uses distance function regression to transform raw point cloud data into structured 3D meshes. The broader capability surface cove
This project is a diffusion-based 3D generator and image-to-3D reconstruction system. It translates natural language descriptions or two-dimensional images into three-dimensional assets using neural radiance fields and diffusion models. The system utilizes score-distillation sampling and diffusion-based guidance to refine 3D shapes without requiring 3D training data. It includes specialized tools for transforming neural representations into exportable meshes with texture and material data, as well as a pipeline for iterative optimization of geometry and textures. The project covers a broad r
This project is a comprehensive educational resource and tutorial handbook for building, training, and deploying machine learning models using TensorFlow 2. It serves as a structured learning guide covering core deep learning concepts, including neural network architectures, automatic differentiation, and tensor operations. The handbook provides technical guidance on optimizing execution efficiency through GPU memory management, distributed training, and model quantization. It also includes detailed manuals for constructing high-performance data pipelines and exporting models for production s
Unity MCP is a plugin that connects the Unity Editor to AI assistants through the Model Context Protocol, enabling natural language control over scene manipulation, object creation, and editor workflows. It allows developers to generate C# scripts, modify GameObjects and components, create UI layouts, and manage assets by issuing commands through an AI interface, effectively turning the editor into a conversational development environment. The plugin distinguishes itself through a comprehensive automation system that can execute multi-step tasks from a design document, record and replay edito
Shap-E is a generative 3D modeling system that creates three-dimensional digital assets from natural language descriptions or two-dimensional images. It functions as a generative model capable of producing three-dimensional implicit functions and assets. The project includes a 3D latent encoder that converts trimeshes and 3D models into latent representations using point clouds and multiview renders. It utilizes an image-to-3D generator to produce assets from synthetic view images and a text-to-3D generator to build shapes from text prompts. The system implements a pipeline involving latent
GET3D is a generative 3D mesh model and rendering framework designed to synthesize high-quality textured shapes and tetrahedral meshes. It functions as an image-to-3D reconstructor and text-to-3D generator, utilizing a differentiable 3D renderer to produce realistic visual perspectives and material effects. The system enables the creation of 3D assets from single 2D images, point clouds, or descriptive text prompts. It features a latent space interpolator for creating smooth transitions between different 3D objects and supports the independent control of geometry and texture. The project cov
Modly is a local AI 3D model generator that converts two-dimensional images into three-dimensional meshes. It is a privacy-focused tool that processes data directly on the host graphics card using GPU-accelerated inference. The system serves as an extensible AI model framework, allowing the integration of external model extensions and runtime files from remote repositories. It utilizes a manifest-driven plugin architecture to add new generation methods by loading metadata and files from external version control systems. The toolset includes a command-line interface for triggering generation
Hunyuan3D-2.1 is a generative 3D framework and image-to-3D pipeline that transforms single 2D images into textured 3D geometries. It functions as an asset generator that produces high-quality 3D meshes and textures using a flow-matching system. The project includes a specialized synthesizer for creating photorealistic textures with physically based rendering properties. These tools allow for the simulation of metallic reflections and light interactions on generated models. The system covers 3D asset pipeline automation through a sequence of shape generation and mesh refinement. It also provi
Sana is a framework for high-resolution image and video synthesis based on a linear diffusion transformer. It provides a toolkit for the training, fine-tuning, and execution of text-to-image and text-to-video models, as well as a video generative world model capable of simulating physical environments with precise spatial control. The project is distinguished by its use of linear complexity layers to handle high resolutions and its support for long-form, minute-length video generation in real time. It implements a two-stage inference paradigm that separates structural generation from visual t
TRELLIS.2 is a generative image-to-3D system that creates high-resolution 3D assets with physically based rendering materials from 2D images. It utilizes a sparse voxel representation to handle complex topologies and internal structures without relying on iso-surface fields. The project features a structured latent space representation that maps geometry and texture attributes to maintain visual fidelity. It employs an optimization-free geometry reconstruction process to decode latent representations directly into voxel grids and includes a PBR texture generator for synthesizing base color, r
DeepSpeedExamples is a collection of reference implementations for training and deploying large scale AI models using the DeepSpeed optimization library. It provides Python code examples for training massive models across multiple GPUs through distributed optimization techniques. The repository includes optimized patterns for deploying and running large language model predictions in production environments. It also serves as a guide for model compression to reduce memory footprints and as a source for performance benchmarks to measure execution speed and resource utilization. The project cov
ModelScope is a comprehensive machine learning platform that functions as a model hub, training framework, inference engine, and cloud development environment. It provides a centralized repository for discovering, downloading, and managing pre-trained models and datasets across multiple modalities, including natural language, vision, and speech. The platform features a unified interface for multimodal model inference and a standardized framework for fine-tuning and evaluating large-scale models. It supports distributed training to scale workloads across multiple processors and provides contai
Assimp is a cross-platform 3D asset pipeline and import library that loads numerous industry-standard 3D file formats into a single unified internal data structure. It functions as a framework for converting 3D models between different file formats across multiple operating systems and architectures. The project provides a 3D mesh processing tool for normalizing and optimizing geometry through triangulation, vertex removal, and normal generation. It also includes a 3D asset export utility to write internal scene data back into various external file formats. The system covers broad capability
Open CLIP is an open source framework for training and deploying Contrastive Language-Image Pre-training models. It serves as a vision-language training framework and multimodal embedding engine that maps images and text into a shared vector space for similarity searches and zero-shot classification. The project provides a toolkit for distributed training of contrastive models and includes an image-to-text generative model for producing natural language descriptions. It supports custom text encoder integration and utilizes teacher-student model distillation to transfer knowledge from large pr
FedML is a distributed machine learning training library, federated learning framework, and GPU workload orchestrator. It provides the core system components necessary to execute large-scale model training and fine-tuning across multi-cloud, on-premise, and decentralized GPU clusters, while offering a dedicated engine for scalable model serving and an MLOps pipeline manager for end-to-end lifecycle management. The platform distinguishes itself by enabling privacy-preserving federated learning across decentralized edge devices and organizational silos, keeping raw data on local hardware. It al
Super-Gradients is a PyTorch computer vision framework and training library designed for the full lifecycle of vision models. It functions as a deep learning model optimizer and a deployment toolkit for training and fine-tuning models across image classification, object detection, semantic segmentation, and pose estimation tasks. The project provides specific tools for model optimization, including teacher-student knowledge distillation and numerical precision compression to reduce memory and computational requirements. It also includes the implementation of the Yolo-NAS architecture for high
Megatron-LM is a distributed transformer training library and large language model training framework designed to scale models across thousands of GPUs. It functions as a GPU-optimized deep learning toolkit and a scaling engine for mixture-of-experts architectures, enabling the training of models with hundreds of billions of parameters. The project implements multi-dimensional model parallelism, combining tensor, pipeline, data, expert, and context-based workload distribution. It specifically optimizes mixture-of-experts architectures through integrated memory and communication improvements t
This is a PyTorch implementation of a text-to-image model designed for synthesizing high-fidelity images from natural language descriptions. It utilizes a diffusion image generator to transform latent embeddings into visual data through an iterative denoising process. The system employs a two-stage latent mapping process, using a CLIP-based latent prior to map text embeddings to image embeddings before decoding them into pixels. It features a cascading diffusion decoder that produces high-resolution imagery by passing low-resolution outputs through a sequence of models at increasing scales.
Amazon DSSTNE is a machine learning toolkit and sparse tensor network library designed for deep learning models with sparse inputs and outputs. It provides a model-parallel training framework and a GPU-accelerated sparse engine to support memory-intensive networks. The framework is specifically designed for recommendation system training and large-scale sparse learning. It enables the distribution of large weight matrices and embedding tables across multiple GPU devices to handle models that exceed the memory capacity of a single processor. The project covers a broad range of capabilities in
This repository is a comprehensive collection of functional 2D and 3D demo projects and implementation samples for the Godot Game Engine. It serves as an interactive tutorial and reference library, providing a working codebase to demonstrate how to apply engine features in real-world scenarios. The collection focuses on practical implementation guides, covering a wide array of technical capabilities from basic engine fundamentals to advanced rendering and scripting techniques. It allows users to study the application of node-based composition, asset pipelines, and game logic through direct ex
This project is a comprehensive educational framework designed to teach the design, deployment, and performance optimization of machine learning systems. It provides a structured curriculum that covers the full stack of artificial intelligence engineering, ranging from the construction of core framework components like tensors and automatic differentiation engines to the orchestration of large-scale distributed training clusters. The platform distinguishes itself through its integration of physics-grounded systems modeling and interactive simulation environments. Users can experiment with dis
This project is a comprehensive computer vision library for the PyTorch ecosystem, providing a standardized collection of neural network architectures, datasets, and high-performance transformation utilities. It serves as a foundational framework for building, training, and deploying deep learning models, offering a centralized model registry that allows developers to instantiate architectures with pre-trained weights for tasks such as image classification, object detection, and semantic segmentation. The library distinguishes itself through its modular approach to data and compute management
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
ComfyUI-3D-Pack is a suite of custom nodes for ComfyUI that enables 3D asset generation and rendering within a node-based workflow. It provides a set of tools for reconstructing textured three-dimensional meshes and volumetric scenes from single images, multi-view images, or text prompts. The system includes a Gaussian splatting generator for creating high-fidelity volumetric 3D scene representations and a multi-view image generator to produce consistent image sets for reconstruction. It also features a single image 3D mesh tool to build geometry from a single 2D source. The toolset covers 3
DreamGaussian is a generative system and converter designed to create textured three-dimensional models from text or images using Gaussian Splatting. It functions as a pipeline for transforming two-dimensional inputs into high-fidelity 3D assets. The project provides specific workflows for converting 3D Gaussian point clouds into standard textured mesh formats compatible with external 3D software. It supports the generation of textured meshes from single images via volumetric refinement and UV texture optimization, as well as the creation of 3D models from text prompts through intermediate im
Hunyuan3D-2 is a machine learning framework designed to convert two-dimensional images into fully realized, textured three-dimensional meshes. It utilizes a generative artificial intelligence model to perform both shape construction and surface texture synthesis, enabling the automated creation of digital assets. The system distinguishes itself through a modular generative pipeline that separates geometry reconstruction from texture mapping. It employs multi-view image projection and latent diffusion techniques to ensure geometric consistency, while providing a plugin-based bridge architectur
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of images from text prompts and the transformation of existing visual inputs based on semantic instructions. The architecture utilizes a modular execution environment that decouples model loading, scheduler logic, and inference components to support diverse hardware configurations. I
AnimateAnyone is an appearance-preserving video synthesizer designed for character animation from a single static image. It functions as a diffusion image-to-video generator that transforms a source image into a high-fidelity video sequence while maintaining consistent character identity, clothing, and visual details across all frames. The system enables video-driven character reenactment by transferring motions, facial expressions, and body movements from a reference video onto a static character. It employs pose-guided video generation to control movement via skeleton keypoints and pose sig
StyleGAN is a TensorFlow-based generative adversarial network framework designed for the synthesis of high-resolution synthetic imagery. It utilizes a style-based generator architecture to create realistic visual assets from latent vectors, focusing on the production of high-fidelity images. The system incorporates style mixing and stochastic noise injection to control visual attributes and fine-grained details. It uses adaptive instance normalization and progressive resolution upsampling to manage image quality and variety across different resolutions. The framework covers the full lifecycl