What are the best open-source alternatives to Unstract?

30 open-source projects similar to zipstack/unstract, ranked by shared features. Top picks: ucbepic/docetl, kreuzberg-dev/kreuzberg, maiot-io/zenml, datalab-to/surya, axa-group/parsr, unstructured-io/unstructured, mage-ai/mage-ai, negokaz/excel-mcp-server, jxxghp/moviepilot, the-pocket/pocketflow-tutorial-codebase-knowledge.

Is ucbepic/docetl a good alternative to Unstract?

docetl is an AI-powered document ETL tool and map-reduce orchestrator designed to transform large collections of unstructured documents into structured, queryable tables using language models. It provides a declarative pipeline framework for extracting, cleaning, and transforming data from sources…

Is kreuzberg-dev/kreuzberg a good alternative to Unstract?

Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for b…

Is maiot-io/zenml a good alternative to Unstract?

ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for…

Is datalab-to/surya a good alternative to Unstract?

Surya is a document processing platform designed to transform unstructured files into structured, machine-readable data. It provides a comprehensive suite of tools for text recognition, layout analysis, and reading order detection, enabling the conversion of PDFs and images into formats such as JSO…

Is axa-group/parsr a good alternative to Unstract?

Parsr is an unstructured data extractor and document parsing pipeline that converts raw files and images into cleaned, machine-readable formats. It functions as a document layout analyzer and a pipeline for extracting structured data and labels using large language models. The system includes a do…

Is unstructured-io/unstructured a good alternative to Unstract?

Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for r…

Is mage-ai/mage-ai a good alternative to Unstract?

Mage AI is a Python-based data pipeline orchestrator and self-hosted data integrated development environment. It is designed for building, scheduling, and monitoring data workflows using a block-based pipeline design and interactive notebook interface. The platform distinguishes itself by integrat…

Is negokaz/excel-mcp-server a good alternative to Unstract?

This project is a Model Context Protocol server that enables artificial intelligence assistants to interact directly with Microsoft Excel files. It functions as a bridge, allowing external systems to read, write, and modify spreadsheet data through a standardized interface. By supporting both direc…

Is jxxghp/moviepilot a good alternative to Unstract?

MoviePilot is a self-hosted media orchestrator and NAS media library automator. It coordinates workflows between downloaders, metadata scrapers, and file systems to automate the discovery, downloading, renaming, and organization of movie and television content. The system functions as an LLM media…

Is the-pocket/pocketflow-tutorial-codebase-knowledge a good alternative to Unstract?

This project is a comprehensive suite of AI tools and frameworks, featuring an LLM multi-agent orchestrator, an autonomous agent runtime, and a stateful application framework. It provides the infrastructure to build and manage specialized AI agents capable of coordinating complex tasks through grap…

Back to zipstack/unstract

Open-source alternatives to Unstract

30 open-source projects similar to zipstack/unstract, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Unstract alternative.

ucbepic/docetl
ucbepic/docetl
3,597View on GitHub
docetl is an AI-powered document ETL tool and map-reduce orchestrator designed to transform large collections of unstructured documents into structured, queryable tables using language models. It provides a declarative pipeline framework for extracting, cleaning, and transforming data from sources such as PDFs and text files into predefined schemas. The project distinguishes itself through a semantic data integration suite that enables joining datasets and resolving duplicate entities based on embedding-based similarity. It includes an interactive prompt playground for developing and optimizi
Pythonagentsdatadata-pipelines
View on GitHub3,597
kreuzberg-dev/kreuzberg
kreuzberg-dev/kreuzberg
8,527View on GitHub
Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for browser-based processing. The system is designed to run entirely on self-hosted infrastructure, with no data leaving the user's environment. What distinguishes Kreuzberg is its breadth of integration surfaces and its pipeline architecture. It exposes extraction capabilities through native bindings fo
Rustdocument-intelligenceelixirffi
View on GitHub8,527
maiot-io/zenml
maiot-io/zenml
5,452View on GitHub
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Python
View on GitHub5,452

Open-source alternatives to Unstract

ucbepic/docetl

kreuzberg-dev/kreuzberg

maiot-io/zenml

datalab-to/surya

axa-group/Parsr

Unstructured-IO/unstructured

mage-ai/mage-ai

negokaz/excel-mcp-server

jxxghp/MoviePilot

The-Pocket/PocketFlow-Tutorial-Codebase-Knowledge

modelcontextprotocol/modelcontextprotocol

camel-ai/camel

qodo-ai/qodo-cover

Cinnamon/kotaemon

ever-co/ever-gauzy

langchain-ai/deepagents

pewdiepie-archdaemon/odysseus

Tencent/WeKnora

infobyte/faraday

Arize-ai/phoenix

datahub-project/datahub

aws/aws-cdk

microsoft/markitdown

google/langextract

ahujasid/blender-mcp

lastmile-ai/mcp-agent

browserbase/mcp-server-browserbase

pymupdf/PyMuPDF

VikParuchuri/marker

czlonkowski/n8n-mcp