PaddleX

PaddleX

PaddleX is a PaddlePaddle-based framework for building, deploying, and fine-tuning AI model pipelines, with pre-built support for computer vision, OCR, document analysis, and time series tasks. It offers a toolkit of ready-to-use pipelines for image classification, object detection, segmentation, and pose estimation, alongside an end-to-end OCR document analysis pipeline that extracts text, tables, formulas, and layout information. The platform also includes a dedicated time series forecasting pipeline for analyzing historical data to detect anomalies, classify patterns, and predict future values.

The framework is built on a pipeline-based modular architecture that allows complex vision and language tasks to be composed as chains of modules, with a unified interface accessible through Python scripts, command-line commands, and REST API endpoints. It supports multi-backend inference engines including Paddle Inference, TensorRT, OpenVINO, ONNX Runtime, and Ascend OM, with hardware-agnostic device switching that lets users change between GPU, NPU, XPU, and MLU accelerators by modifying a single parameter. Pipelines are configured through declarative YAML files, and individual sub-modules can be retrained on custom data and swapped in without rebuilding the entire pipeline.

The platform covers a broad range of capabilities including image classification, object detection with open-vocabulary and rotated variants, instance and semantic segmentation, human keypoint detection, face detection and feature extraction, pedestrian and vehicle attribute detection, and 3D multi-modal object detection from fused camera and LiDAR data. Document processing features include text region detection and recognition, table structure recognition and content extraction, mathematical formula recognition, seal text extraction, document layout parsing, and document image question answering. Time series analysis supports forecasting, anomaly detection, and classification, while video understanding includes action detection and video classification.

Pipelines can be deployed as production-ready HTTP APIs, containerized services, or edge-device binaries for Android and other platforms, with high-performance inference acceleration and backend-specific parameter tuning available for optimizing runtime performance.

Features

Computer Vision Pipelines - Provides a toolkit of pre-built pipelines for image classification, detection, segmentation, and pose estimation.

Pipeline Frameworks - Provides the core framework for building, deploying, and fine-tuning modular AI model pipelines.

Object Detection - Identifies multiple object categories and their bounding-box locations within images.

Computer Vision Toolkits - Offers pre-built pipelines for image classification, object detection, segmentation, and pose estimation.

Document Information Extraction - Combines OCR, layout analysis, and LLMs to extract structured information from complex scanned documents.

Detection Model Selections - Offers dozens of pre-trained detection models with configurable accuracy-speed trade-offs.

NPU Inference Execution - Switches between GPU, NPU, and other hardware devices with a single parameter change for inference.

Image Classification - Assigns images to predefined categories using trained deep-learning models.

Image Text Translators - Converts printed, handwritten, or symbolic text in images into editable text using a multi-stage OCR pipeline.

Multi-Backend GPU Inference Engines - Ships a multi-backend inference engine supporting Paddle Inference, TensorRT, OpenVINO, ONNX Runtime, and Ascend OM.

Hardware-Agnostic Inference Switching - Changes the computing device for model inference by setting a single parameter, supporting GPUs, NPUs, and other accelerators.

Model Serving & Deployment - Deploys PaddlePaddle model pipelines as REST APIs, edge device binaries, or high-performance inference services.

Open-Vocabulary Object Detection - Identifies and locates objects in images using natural language prompts instead of predefined categories.

Pipeline Executors - Provides a unified interface to run complete pre-trained model pipelines on input data via CLI or Python.

Inference Pipelines - Provides a pipeline-based inference system for running AI models on input data.

Time Series Forecasting - Ships a dedicated pipeline for forecasting, classifying, and detecting anomalies in time series data.

OCR Pipelines - Provides an end-to-end pipeline for extracting text, tables, formulas, and layout from documents.

CLI Executions - Provides a CLI command that runs the full OCR pipeline on images or PDFs.

Model Fine-Tuning - Retrains pre-trained models on domain-specific datasets to improve accuracy for specialized tasks.

Text Detection - Locates text areas in images with bounding boxes for downstream recognition.

Text Recognition - Transcribes printed or handwritten text from image regions into machine-readable strings.

Table Structure Detections - Implements algorithms for identifying tabular grids and merged cells within document layouts.

Image-Based Table Extractors - Ships a pipeline that extracts table structures from images and outputs them as formatted HTML.

Pipeline Configurations - Loads pipeline behavior from declarative YAML configuration files to override default settings for initialization and inference.

Document Layout Extraction - Identifies and classifies document regions like titles, tables, and figures.

Pipeline Execution Interfaces - Executes pre-built processing pipelines by specifying name, input file, and target device in a single terminal command.

Pipeline REST API Servers - Starts a REST server from a pipeline name or config file, exposing models for inference over HTTP.

Unified Pipeline Interfaces - Provides a unified Python, CLI, and REST API interface for executing all pre-built AI pipelines.

AI Model Production Deployment - Serves trained pipelines as REST APIs, containerized services, or edge-device binaries for production inference.

Modular Pipeline Architectures - Organizes vision and language tasks as composable pipeline modules that can be chained together.

Model Inference Deployment - Deploys trained models as high-performance inference services, containerized endpoints, or on edge devices.

AI Pipeline Service Deployments - Ships a framework for deploying AI pipelines as production-ready HTTP APIs.

Multi-Device Inference Switchers - Enables switching inference between GPU, NPU, XPU, and MLU accelerators by modifying a single parameter across all pipelines.

Image-Based Table Recognizers - Provides a pipeline that extracts structured table data from images and PDFs into multiple output formats.

YAML-Driven Configurations - Controls pipeline behavior, model selection, and hardware acceleration through declarative YAML files.

Human - Identifies and locates specific body joints like shoulders and elbows in images to analyze human pose.

Instance Segmentation Engines - Detects each object instance and produces pixel-level masks for them.

CLI Executions - Provides a CLI command and Python SDK for running instance segmentation on images.

Instance Segmentation Service Deployments - Exposes the segmentation pipeline as an HTTP API returning detection boxes, masks, and scores.

Multi-Device Inference Switching - Supports switching between GPU, NPU, XPU, and MLU accelerators with a single parameter.

Rotated Detections - Detects objects with rotated bounding boxes that include angle information for reduced background noise.

CLI Executions - Ships a CLI command that detects objects with rotated bounding boxes including angle information.

Semantic Segmentation Training - Provides pre-trained models for pixel-level semantic segmentation of images into categories.

Recognition Module Fine-Tuners - Provides a pipeline for retraining recognition modules on custom data to improve accuracy for domain-specific documents.

High-Performance Pipeline Deployments - Optimizes model inference and pre/post-processing for faster end-to-end throughput in production.

Face Detection - Locates human faces in images with bounding boxes using pre-trained models.

Confidence Filtering - Applies global or per-class score thresholds to discard low-confidence bounding boxes from detection results.

Pipeline Module Selectors - Ships a mechanism for selecting specific pre-trained models for individual sub-tasks within a pipeline.

Feature Extractors - Computes compact vector representations of faces for recognition or verification tasks.

GPU-Accelerated Inference - Supports GPU-accelerated inference via OpenCL to improve model prediction performance on supported hardware.

CLI Executions - Offers a CLI command that runs image classification inference on images or directories.

OCR Hardware Switching - Changes the compute device for the OCR pipeline by setting a single parameter with no code changes.

Multi-Backend Inference Support - Supports switching between Paddle Inference, TensorRT, OpenVINO, ONNX Runtime, and Ascend OM.

Inference Performance Optimization - Provides a high-performance inference plugin to accelerate model predictions and reduce production latency.

Detection Pipeline Web Services - Exposes the object detection pipeline as an HTTP API that accepts images and returns structured detection results.

Pipeline-Level Retrainers - Provides a pipeline for fine-tuning entire pre-trained pipelines on custom datasets to improve task-specific accuracy.

Modular Fine-Tuning Pipelines - Allows retraining individual pipeline sub-modules on custom data and swapping them in.

Fine-Tuned Pipeline Verifiers - Provides a pipeline for testing fine-tuned models by swapping them into the inference pipeline and verifying performance.

Rotated Object Detection Fine-Tuners - Ships a pipeline for retraining rotated object detection models on custom data to improve accuracy for specific scenarios.

Small Object Detection Fine-Tuners - Provides a pipeline for retraining small-object detection models on custom data to improve domain-specific accuracy.

Vehicle Detection Fine-Tuners - Ships a pipeline for retraining vehicle detection and attribute recognition models on private data to improve accuracy.

On-Device Deployments - Ships a detection pipeline that runs directly on Android and other edge devices without a remote server.

Automatic Backend Selectors - Ships a high-performance inference plugin that automatically selects the optimal backend and configuration for model predictions.

High-Throughput Inference Services - Wraps extraction pipelines as network-accessible services with optimized inference for production environments.

OCR Model Fine-Tuners - Ships a pipeline for retraining OCR text detection and recognition modules on private datasets.

Image Anomaly Detection Pipelines - Identifies defects or abnormal regions in images using pre-trained anomaly detection models.

Pedestrian Attribute Pipeline Deployments - Packages the detection and attribute recognition pipeline as an HTTP API service for remote inference.

Multi-Modal 3D Detections - Fuses camera and LiDAR data to detect and classify objects in 3D space for autonomous driving.

Detection Service Deployments - Exposes rotated object detection as an HTTP API service.

Small Object Detectors - Identifies and classifies small-sized objects in complex scenes for surveillance or autonomous driving.

CLI Executions - Ships a CLI command that runs open-vocabulary segmentation using text or point prompts.

Web APIs - Exposes object detection as an HTTP service that accepts images and prompts, returning detected objects and visualizations.

CLI Executions - Ships a CLI command that runs open-vocabulary object detection using natural language prompts.

Segmentation Service Deployments - Serves segmentation pipelines as HTTP services with prompt-based inputs.

Time Series Anomaly Detection - Ships a pre-trained time series anomaly detection pipeline that identifies abnormal points in temporal data.

Time Series Classification - Provides a time series classification pipeline that assigns category labels based on temporal patterns.

Multi-Modal Detections - Implements a BEVFusion-based pipeline that fuses camera images with LiDAR point clouds for 3D object detection.

CLI Executions - Ships a CLI command that runs 3D multi-modal object detection from fused camera and LiDAR data.

Document Question Answering - Answers natural-language questions by reading and reasoning over the content of document images.

Prompt-Based Segmentations - Implements prompt-based segmentation using text, boxes, or points to isolate objects without predefined categories.

Document Analysis Fine-Tuners - Ships a pipeline for retraining document extraction models on custom data to improve accuracy for specialized layouts.

Document Extraction Fine-Tuners - Provides a pipeline for retraining document extraction models on custom data to improve accuracy for specific document types.

Instance Segmentation Fine-Tuners - Provides a pipeline for retraining instance segmentation models on custom data to improve accuracy for specific scenes.

Semantic Segmentation Fine-Tuners - Ships a pipeline for retraining semantic segmentation models on custom data to improve accuracy for specific scenes.

Private Data Fine-Tuners - Ships a pipeline for training selected pipeline modules on custom private datasets to improve task-specific accuracy.

Seal Recognition Fine-Tuners - Provides a pipeline for retraining seal recognition modules on private data to improve accuracy for specific scenarios.

Table Recognition Fine-Tuners - Provides a pipeline for retraining table recognition modules on private data to improve accuracy for specific document types.

Time Series Forecasting Fine-Tuners - Ships a pipeline for retraining time series forecasting models on custom data to improve prediction accuracy.

Multi-Label Classifiers - Assigns multiple category labels to a single image by analyzing its content.

Inference Backend Tuners - Provides per-backend parameter tuning for inference engines like TensorRT and OpenVINO to optimize runtime performance.

Seal Recognition Pipelines - Provides a CLI command that detects and extracts curved text from seal images.

Model Executions - Downloads a pre-compiled inference library and optimized model, then builds and executes a prediction binary on Android via ADB.

Classification Pipeline Deployers - Packages the image classification pipeline as a web service or edge device binary for production deployment.

Edge-to-Cloud Deployments - Packages pipelines for deployment on Android, Docker, and high-performance servers.

Multi-Format Segmentation Deployments - Packages segmentation models as services, APIs, or edge binaries for production.

Document Analysis Services - Wraps the full document extraction pipeline into a REST API for remote clients to submit images or PDFs and receive structured results.

Classification Service Deployers - Packages the image classification pipeline as an HTTP service that accepts image URLs or base64 data and returns predictions.

Forecasting Service Deployments - Exposes a time series forecasting pipeline as an HTTP API endpoint for remote inference.

Document Recognition Service Deployments - Provides REST API deployment for formula recognition pipelines.

Inference - Runs a production-grade serving SDK inside Docker containers with GPU and CPU support.

Web Service Deployments - Exposes the detection pipeline as an HTTP API that accepts images and returns structured results.

Pedestrian Attribute Detections - Locates pedestrians and identifies attributes like gender, age, and clothing in images.

Pedestrian Detections - Locates people in images with bounding boxes optimized for human detection.

Image Feature Extraction - Computes compact vector representations of images for similarity search and retrieval.

Human Pose Detections - Detects key body joints in images to reconstruct human poses using pre-trained models.

Formula Recognition Engines - Converts mathematical formula images into LaTeX source code.

CLI Executions - Provides a CLI command that converts mathematical formula images into LaTeX code.

Vehicle Detection - Locates vehicles in images with bounding boxes optimized for vehicle detection.

Vehicle Attribute Recognition - Ships a pipeline that locates vehicles and recognizes their type, color, and license plate attributes.

Attribute Recognition Service Deployments - Exposes vehicle attribute recognition as a network service via REST API.

OCR Web Services - Exposes the OCR pipeline behind a REST API endpoint for remote clients to submit images and receive text results.

PaddlePaddlePaddleX

Features

Star history