# run-llama/liteparse

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/run-llama-liteparse).**

10,782 stars · 710 forks · Rust · Apache-2.0

## Links

- GitHub: https://github.com/run-llama/liteparse
- Homepage: https://developers.llamaindex.ai/liteparse/
- awesome-repositories: https://awesome-repositories.com/repository/run-llama-liteparse.md

## Topics

`document-ocr` `document-processing` `ocr` `ocr-recognition` `pdf` `pdf-parser` `text-extraction`

## Description

A fast, helpful, and open-source document parser

## Tags

### Part of an Awesome List

- [Document Parsing and Extraction](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction.md) — Core document parser that extracts text and bounding boxes from PDFs and other formats into structured output. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [Cost-Optimized Parsers](https://awesome-repositories.com/f/awesome-lists/data/document-parsing/cost-optimized-parsers.md) — Routes each page to the cheapest suitable parsing tier automatically to balance accuracy and expense without manual configuration.
- [Document Text Extractors](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/document-text-extractors.md) — Parses documents to retrieve text alongside precise positional coordinates for each extracted element. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [Spatial Text Extractors](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/document-text-extractors/spatial-text-extractors.md) — Parses documents to retrieve text alongside precise positional coordinates for each extracted element. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [OCR Document Parsers](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/ocr-document-parsers.md) — Applies optical character recognition using a bundled engine or external HTTP server. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [PDF Text Extractors](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/pdf-text-extractors.md) — Parses PDF files and extracts text with spatial bounding boxes, returning structured Markdown, JSON, or plain text. ([source](https://developers.llamaindex.ai/liteparse/api/))
- [Text Extraction and OCR](https://awesome-repositories.com/f/awesome-lists/more/text-extraction-and-ocr.md) — Applies OCR to scanned or image-based PDFs to extract text with optional language selection. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [Office and Documents](https://awesome-repositories.com/f/awesome-lists/productivity/office-and-documents.md) — Extracts text from DOCX, XLSX, PPTX, PNG, JPG, and other file formats via automatic conversion. ([source](https://developers.llamaindex.ai/liteparse/))
- [Multi-Runtime Libraries](https://awesome-repositories.com/f/awesome-lists/data/document-parsing/multi-runtime-libraries.md) — Provides library APIs and CLI for Rust, Node.js/TypeScript, Python, and browser WASM environments. ([source](https://developers.llamaindex.ai/liteparse/))

### Programming Languages & Runtimes

- [Open-Source Document Parsers](https://awesome-repositories.com/f/programming-languages-runtimes/source-code-documentation/documentation-parsers/open-source-document-parsers.md) — An open-source document parser that extracts text, tables, and layout from PDFs and office files into Markdown or JSON.
- [Browser-Based Runtimes](https://awesome-repositories.com/f/programming-languages-runtimes/runtime-execution-environments/webassembly/browser-based-runtimes.md) — Runs the entire parsing engine and OCR inside a web browser using WebAssembly for offline document extraction.

### Artificial Intelligence & ML

- [Content Parsing Prompts](https://awesome-repositories.com/f/artificial-intelligence-ml/instructional-prompting/content-parsing-prompts.md) — Provides prompt-based parsing customization that steers extraction results using natural-language instructions or structured schemas. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Output Schema Instructions](https://awesome-repositories.com/f/artificial-intelligence-ml/instructional-prompting/content-parsing-prompts/output-schema-instructions.md) — Ships prompt-driven output shaping that accepts natural-language instructions or structured schemas to steer extraction results. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Document Page Routing Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training/cost-optimization-strategies/inference-cost-optimizers/document-page-routing-optimizers.md) — Provides automatic per-page routing to the cheapest suitable parsing tier for cost-efficient document extraction.
- [Structured Document Extraction](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/structured-document-extraction.md) — Converts PDFs and office documents into structured Markdown or JSON with spatial layout for direct use by language models.
- [Document Spatial Coordinate Outputs](https://awesome-repositories.com/f/artificial-intelligence-ml/bounding-box-regression/bounding-box-representations/bounding-box-coordinate-predictors/pixel-coordinate-mappings/spatial-coordinate-synchronization/document-spatial-coordinate-outputs.md) — Extracts text items from a PDF and returns them with spatial coordinates for precise layout analysis. ([source](https://developers.llamaindex.ai/liteparse/guides/library-usage/))
- [Document Bounding Box Extractors](https://awesome-repositories.com/f/artificial-intelligence-ml/bounding-box-regression/bounding-box-representations/bounding-box-visualizers/document-bounding-box-extractors.md) — Returns spatial coordinates for every line of text extracted from documents for visualization or processing. ([source](https://developers.llamaindex.ai/liteparse/))
- [Document Bounding Box Extractors](https://awesome-repositories.com/f/artificial-intelligence-ml/bounding-box-regression/bounding-box-representations/document-bounding-box-extractors.md) — Returns spatial bounding boxes for each text line, enabling visualization or further geometric processing. ([source](https://developers.llamaindex.ai/liteparse/))
- [Document JSON Bounding Box Outputs](https://awesome-repositories.com/f/artificial-intelligence-ml/bounding-box-regression/bounding-box-representations/document-json-bounding-box-outputs.md) — Extracts text with bounding boxes from a PDF and outputs the result as structured JSON. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [Document Output Shapers](https://awesome-repositories.com/f/artificial-intelligence-ml/instructional-prompting/deterministic-output-steering/document-output-shapers.md) — Accepts natural-language instructions or structured schemas to steer document extraction results toward desired formats.
- [Custom OCR Backend Registrations](https://awesome-repositories.com/f/artificial-intelligence-ml/ocr-engines/custom-ocr-backend-registrations.md) — Accepts a user-defined OCR engine with a recognize method for custom text extraction. ([source](https://developers.llamaindex.ai/liteparse/guides/browser-usage/))
- [Markdown RAG Pipeline Outputs](https://awesome-repositories.com/f/artificial-intelligence-ml/rag-data-pipelines/markdown-rag-pipeline-outputs.md) — Reconstructs headings, tables, lists, images, and links from spatial layout for direct use in LLMs and RAG pipelines. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))

### Content Management & Publishing

- [Document Generation from Markdown](https://awesome-repositories.com/f/content-management-publishing/content-management-systems/content-management-platforms/enterprise-specialized-systems/document-management-systems/pdf-form-filling/document-generation-from-markdown.md) — Reconstructs headings, tables, lists, images, and links from a PDF's spatial layout into structured Markdown. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [Document Format Converters](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing/format-specific-parsers/office-document-parsers/document-format-converters.md) — Automatically converts over 130 file types including office documents and images into PDF before extracting text and layout.
- [Document to Markdown Converters](https://awesome-repositories.com/f/content-management-publishing/document-to-markdown-converters.md) — Reconstructs headings, tables, lists, images, and links from spatial layout for LLM and RAG pipelines. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [LLM-Ready Markdown Converters](https://awesome-repositories.com/f/content-management-publishing/markdown-renderers/llm-ready-markdown-converters.md) — Converts PDFs and office documents into structured Markdown optimized for language model and RAG pipeline consumption. ([source](https://developers.llamaindex.ai/liteparse/guides/markdown/))
- [PDF to Markdown Conversion](https://awesome-repositories.com/f/content-management-publishing/pdf-to-markdown-conversion.md) — Converts PDF documents into structured Markdown preserving headings, tables, lists, images, and links. ([source](https://developers.llamaindex.ai/liteparse/guides/library-usage/))
- [REST and SDK Parsing Interfaces](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-apis/document-parsing-controls/rest-and-sdk-parsing-interfaces.md) — Provides REST, Python, and TypeScript interfaces to upload documents and retrieve parsed results programmatically. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Command-Line Document Processors](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/document-automation-interfaces/command-line-document-processors.md) — Processes files from the command line with options for format, page range, and remote URLs. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [Document Parsing Services](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/document-automation-interfaces/document-parsing-services.md) — Integrates document parsing into applications through a library API accepting file paths or raw byte buffers. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [WASM-Based PDF Parsers](https://awesome-repositories.com/f/content-management-publishing/documentation-knowledge-management/web-based-document-viewers/integrated-pdf-browsers/wasm-based-pdf-parsers.md) — Parses PDF documents entirely in the browser using WebAssembly, requiring no server or cloud calls. ([source](https://developers.llamaindex.ai/liteparse/guides/browser-usage/))

### Data & Databases

- [Document Table Extractors](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/table-extraction-utilities/document-table-extractors.md) — Recovers table data from PDFs, scans, and images with cell structure intact for downstream use. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Browser-Based Parsers](https://awesome-repositories.com/f/data-databases/document-parsing-engines/browser-based-parsers.md) — Runs the entire parsing engine and OCR inside a web browser using WebAssembly for offline or serverless document extraction. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [PDF Spatial Layout Parsers](https://awesome-repositories.com/f/data-databases/document-parsing-engines/web-document-parsing/visual-layout-parsing/pdf-spatial-layout-parsers.md) — Extracts text from PDFs while preserving exact position on each page including bounding boxes for every line. ([source](https://developers.llamaindex.ai/liteparse/))
- [Multi-Format Document Ingestion](https://awesome-repositories.com/f/data-databases/multi-format-document-ingestion.md) — Handles PDF, DOCX, PPTX, XLSX, HTML, JPEG, PNG, XML, EPUB, and many other formats for flexible document ingestion. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Layout-Aware Extraction](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/layout-aware-extraction.md) — Combines spatial layout analysis with OCR to extract text, tables, and charts preserving document structure.
- [Layout Preservation](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/layout-preservation.md) — Extracts text, tables, and images from PDFs and office documents while preserving spatial layout and structure.

### Development Tools & Productivity

- [Multi-Format Document Parsing](https://awesome-repositories.com/f/development-tools-productivity/file-indexing-utilities/multi-format-document-parsing.md) — Converts over 130 file types including office documents and images into PDF before extracting text and layout.
- [Diagram Structure Parsing](https://awesome-repositories.com/f/development-tools-productivity/ast-transformation-tools/ast-based-formatters/diagram-structure-parsing.md) — Converts visual data from charts, plots, and diagrams into structured formats for numerical reasoning by LLMs. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Document Chart Parsers](https://awesome-repositories.com/f/development-tools-productivity/ast-transformation-tools/ast-based-formatters/diagram-structure-parsing/document-chart-parsers.md) — Extracts charts, plots, and diagrams from documents into structured data for numerical reasoning by LLMs. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Browser-Based OCR Engines](https://awesome-repositories.com/f/development-tools-productivity/python-development-tools/script-execution-engines/python-scripting-environments/standalone-executable-generators/ocr-standalone-executables/browser-based-ocr-engines.md) — Provides a JavaScript-side OCR engine with a recognize method for text extraction in WASM environments. ([source](https://developers.llamaindex.ai/liteparse/guides/library-usage/))
- [OCR REST API Servers](https://awesome-repositories.com/f/development-tools-productivity/rest-api-integrations/ocr-rest-api-servers.md) — Sends OCR requests to remote HTTP services for higher accuracy or performance. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))

### Operating Systems & Systems Programming

- [Document Page Cost Optimizers](https://awesome-repositories.com/f/operating-systems-systems-programming/paged-memory-management/large-page-optimizers/document-page-cost-optimizers.md) — Automatically routes each page to the cheapest suitable parsing tier, reserving premium accuracy for complex layouts. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))

### Scientific & Mathematical Computing

- [Document Layout Bounding Box Extractors](https://awesome-repositories.com/f/scientific-mathematical-computing/spatial-bounding-box-management/document-layout-bounding-box-extractors.md) — Returns precise coordinates for every text line and table cell, preserving document layout for downstream analysis.

### Web Development

- [OCR Document Conversion](https://awesome-repositories.com/f/web-development/document-conversion-apis/ocr-document-conversion.md) — Extracts text, tables, and charts from PDFs while preserving spatial layout and structure. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))
- [Document Page Cost Optimizers](https://awesome-repositories.com/f/web-development/rendering-mode-configurators/per-page-rendering-modes/document-page-cost-optimizers.md) — Automatically routes each page to the cheapest suitable parsing tier, reserving premium accuracy for complex layouts. ([source](https://developers.llamaindex.ai/python/cloud/llamaparse/))

### Graphics & Multimedia

- [Document Page Rendering](https://awesome-repositories.com/f/graphics-multimedia/debug-image-exports/document-page-rendering.md) — Converts document pages into raster images for LLM agents to extract visual information. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
- [PDF Page Image Generators](https://awesome-repositories.com/f/graphics-multimedia/debug-image-exports/document-page-rendering/pdf-page-image-generators.md) — Renders PDF pages as raster images for use in LLM agents or visual workflows. ([source](https://developers.llamaindex.ai/liteparse/getting_started/))
- [Browser Screenshot Capture](https://awesome-repositories.com/f/graphics-multimedia/web-page-screenshot-tools/browser-screenshot-capture.md) — Generates page images as PNG byte buffers for use with LLMs or disk storage. ([source](https://developers.llamaindex.ai/liteparse/guides/library-usage/))

### Security & Cryptography

- [Document Directory Parsers](https://awesome-repositories.com/f/security-cryptography/ldap-services/batch-directory-processing/document-directory-parsers.md) — Parses all documents in a given input folder and writes results to a specified output directory. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))

### Software Engineering & Architecture

- [Batch Document Processing](https://awesome-repositories.com/f/software-engineering-architecture/batch-document-processing.md) — Processes entire directories of documents efficiently with a single command, reusing the parsing engine. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))

### Testing & Quality Assurance

- [Document Page Screenshot Capturers](https://awesome-repositories.com/f/testing-quality-assurance/automation-interaction-tools/screenshot-capture/document-page-screenshot-capturers.md) — Renders pages as high-quality PNG images to capture visual information that text alone cannot convey. ([source](https://cdn.jsdelivr.net/gh/run-llama/liteparse@main/README.md))
