# google/gumbo-parser

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/google-gumbo-parser).**

5,190 stars · 664 forks · HTML · Apache-2.0 · archived

## Links

- GitHub: https://github.com/google/gumbo-parser
- awesome-repositories: https://awesome-repositories.com/repository/google-gumbo-parser.md

## Description

Gumbo-parser is a high-performance HTML5 parsing library written in pure C99. It transforms raw markup into a structured document tree by implementing the formal state-machine tokenization and error recovery rules defined in the HTML5 specification.

The project serves as an HTML source mapping tool, linking parsed nodes back to their original byte offsets and pointers within the input buffer. This allows for the precise tracking of source locations for elements within the resulting parse tree.

Beyond full document processing, the library handles isolated HTML fragments and provides a C-based foreign function interface to enable integration and language bindings for other environments.

## Tags

### Data & Databases

- [HTML5 Parsers](https://awesome-repositories.com/f/data-databases/document-parsing-engines/web-document-parsing/html5-parsers.md) — Provides a high-performance parser that strictly adheres to the HTML5 specification for transforming markup into structured trees. ([source](https://github.com/google/gumbo-parser/blob/master/.gitmodules))
- [Token Offset Mapping](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/indexing-architectures/byte-range-indexing/token-offset-mapping.md) — Maps structural nodes to their exact byte offsets within the input buffer to avoid data duplication.

### Development Tools & Productivity

- [Source Text Mapping](https://awesome-repositories.com/f/development-tools-productivity/ast-transformation-tools/ast-node-location/source-text-mapping.md) — Maps elements in the resulting parse tree back to their exact character offsets in the source text. ([source](https://github.com/google/gumbo-parser/blob/master/original-README.md))
- [AST-to-Source Mappings](https://awesome-repositories.com/f/development-tools-productivity/source-map-generators/event-to-source-mapping/ast-to-source-mappings.md) — Links parsed nodes in the resulting tree back to their original source code offsets. ([source](https://github.com/google/gumbo-parser/blob/master/setup.py))
- [DOM-to-Source Mappings](https://awesome-repositories.com/f/development-tools-productivity/source-map-generators/event-to-source-mapping/dom-to-source-mappings.md) — Links structural DOM nodes back to their originating source positions using byte offsets and pointers.

### Operating Systems & Systems Programming

- [C99 HTML Parsing](https://awesome-repositories.com/f/operating-systems-systems-programming/c99-html-parsing.md) — Implements high-performance HTML processing in pure C99 for lightweight and fast integration.

### Software Engineering & Architecture

- [Markup State Machines](https://awesome-repositories.com/f/software-engineering-architecture/markup-state-machines.md) — Employs a deterministic state machine to transform raw character streams into discrete HTML5 tokens.
- [Spec-Driven Recovery](https://awesome-repositories.com/f/software-engineering-architecture/syntax-parsing-engines/fault-tolerant-parsing/structural-syntax-recovery/parsing-error-recovery/spec-driven-recovery.md) — Uses official HTML5 specification rules to recover from malformed markup and ensure a consistent document tree.
- [In-Memory Tree Hierarchies](https://awesome-repositories.com/f/software-engineering-architecture/in-memory-tree-hierarchies.md) — Implements the parsed document as a hierarchy of linked nodes using C99 pointers for efficient traversal.

### Web Development

- [HTML Parsers](https://awesome-repositories.com/f/web-development/html-parsers.md) — Provides a C99-based engine that converts raw HTML strings into structured trees for programmatic analysis.
- [DOM Fragment Parsing](https://awesome-repositories.com/f/web-development/dom-fragment-parsing.md) — Processes isolated snippets of HTML markup into structured fragments without requiring a full document. ([source](https://github.com/google/gumbo-parser/blob/master/original-README.md))

### Content Management & Publishing

- [Source Location Mapping](https://awesome-repositories.com/f/content-management-publishing/web-content-scraping/source-location-mapping.md) — Maps parsed HTML elements back to their original source locations to track content existence within a file.

### Programming Languages & Runtimes

- [Parsing Pipelines](https://awesome-repositories.com/f/programming-languages-runtimes/two-phase-compilation/parsing-pipelines.md) — Separates the initial tokenization process from the tree construction phase to correctly handle nested elements.

### User Interface & Experience

- [HTML Content Processing](https://awesome-repositories.com/f/user-interface-experience/html-content-processing/html-content-processing.md) — Parses isolated HTML snippets into structured data representations without needing a full document.

### Part of an Awesome List

- [Build Systems](https://awesome-repositories.com/f/awesome-lists/devtools/build-systems.md) — A library for parsing HTML5 documents.
- [Networking Libraries](https://awesome-repositories.com/f/awesome-lists/devtools/networking-libraries.md) — HTML5 parsing library for C99.
- [Windows Environments](https://awesome-repositories.com/f/awesome-lists/more/windows-environments.md) — Listed in the “Windows Environments” section of the Awesome C awesome list.
