# sparklemotion/nokogiri

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/sparklemotion-nokogiri).**

6,236 stars · 936 forks · C · mit

## Links

- GitHub: https://github.com/sparklemotion/nokogiri
- Homepage: https://nokogiri.org/
- awesome-repositories: https://awesome-repositories.com/repository/sparklemotion-nokogiri.md

## Topics

`libxml2` `libxslt` `nokogiri` `ruby` `ruby-gem` `sax` `xerces` `xml` `xslt`

## Description

Nokogiri is an XML and HTML parsing library that builds navigable document trees from strings, files, or URLs using native C parsers for speed and standards compliance. It provides a CSS selector engine that translates CSS3 selectors into XPath expressions for querying nodes, an XPath query interface with namespace support, a document manipulation toolkit for modifying parsed documents, XSD schema validation, and XSLT transformation capabilities.

The library wraps libxml2 and libxslt C libraries with Ruby bindings for high-performance parsing, and integrates Google's Gumbo parser for standards-compliant HTML5 parsing with error reporting. It supports multiple parsing approaches including SAX event-driven parsing for processing documents as a stream of events without building a full DOM tree, and a push parser that accepts data chunks incrementally. A builder DSL allows constructing XML or HTML documents programmatically using nested method calls that mirror the output structure.

Nokogiri offers comprehensive document manipulation capabilities including node creation, removal, replacement, cloning, and wrapping, along with attribute manipulation and text content modification with automatic XML escaping. It supports document parsing from strings, files, or URLs with explicit encoding declaration, and provides CSS and XPath querying for node selection. The library also includes namespace management, document serialization to HTML, XHTML, or XML, and stream-based processing for large documents.

The library is installable via pre-compiled native gems for various platforms, or can be built from source using system libraries, bundled copies, or custom library paths. It includes XXE protection by blocking external entity loading and network access by default, and provides a security vulnerability reporting process through HackerOne.

## Tags

### Part of an Awesome List

- [HTML and XML Parsing](https://awesome-repositories.com/f/awesome-lists/data/html-and-xml-parsing.md) — Parses XML, HTML4, and HTML5 documents from strings, files, or URLs into navigable trees. ([source](https://cdn.jsdelivr.net/gh/sparklemotion/nokogiri@main/README.md))
- [Ruby C Extension Wrappers](https://awesome-repositories.com/f/awesome-lists/devtools/c-bindings/ruby-c-extension-wrappers.md) — Wraps libxml2 and libxslt C libraries with Ruby bindings for high-performance XML and HTML parsing.
- [Document Tree Builders](https://awesome-repositories.com/f/awesome-lists/devtools/dsl-builders/document-tree-builders.md) — Constructs XML or HTML documents programmatically using nested method calls that mirror the output structure.
- [Parsing and Serialization](https://awesome-repositories.com/f/awesome-lists/devtools/parsing-and-serialization.md) — Adding, removing, or altering nodes and attributes in a parsed document tree and writing the result back as markup. ([source](https://nokogiri.org/rdoc/index.html))
- [XML Escaping Writers](https://awesome-repositories.com/f/awesome-lists/ai/ai-writing-and-content/xml-escaping-writers.md) — Gets and sets node text content with automatic XML escaping for safe markup generation. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [XML Fragment Parsing](https://awesome-repositories.com/f/awesome-lists/data/html-and-xml-parsing/xml-fragment-parsing.md) — Parses strings as fragments without DOCTYPE or root elements for partial content. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [File Handle Parsers](https://awesome-repositories.com/f/awesome-lists/data/html-and-xml-parsing/xml-parsing/file-handle-parsers.md) — Parses HTML or XML files by passing file handles directly to the parser. ([source](https://nokogiri.org/tutorials/parsing_an_html_xml_document.html))
- [Stream Readers](https://awesome-repositories.com/f/awesome-lists/data/html-and-xml-parsing/xml-parsing/stream-readers.md) — Reads large XML documents node-by-node without loading entirely into memory. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [XML and HTML Builder DSLs](https://awesome-repositories.com/f/awesome-lists/devtools/dsl-builders/xml-and-html-builder-dsls.md) — Create new XML or HTML documents from scratch using a builder DSL, and modify existing documents programmatically. ([source](https://nokogiri.org/rdoc/index.html))
- [Ruby Crawling Frameworks](https://awesome-repositories.com/f/awesome-lists/devtools/ruby-crawling-frameworks.md) — HTML/XML parser with XPath and CSS support.

### Data & Databases

- [Markup Document Parsers](https://awesome-repositories.com/f/data-databases/document-parsing-engines/web-document-parsing/markup-document-parsers.md) — Parses HTML or XML strings or IO streams into traversable document objects. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [XPath 2.0 Parsing](https://awesome-repositories.com/f/data-databases/content-extraction/xpath-2-0-parsing.md) — Locates nodes in parsed documents using XPath 1.0 expressions for precise structural queries. ([source](https://nokogiri.org/rdoc/index.html))
- [CSS and XPath Query Engines](https://awesome-repositories.com/f/data-databases/content-extraction/xpath-2-0-parsing/css-and-xpath-query-engines.md) — Locates nodes in parsed documents using CSS3 selectors or XPath 1.0 expressions.
- [Explicit Encoding Declarations](https://awesome-repositories.com/f/data-databases/document-parsing-engines/web-document-parsing/explicit-encoding-declarations.md) — Set the expected encoding explicitly on the parser so that text values are correctly decoded to UTF-8. ([source](https://nokogiri.org/rdoc/index.html))
- [HTML5 Parsers](https://awesome-repositories.com/f/data-databases/document-parsing-engines/web-document-parsing/html5-parsers.md) — Integrates Google's Gumbo parser for standards-compliant HTML5 document parsing. ([source](https://nokogiri.org/tutorials/parsing_an_html5_document.html))
- [Serializers](https://awesome-repositories.com/f/data-databases/document-parsing-engines/web-document-parsing/html5-parsers/serializers.md) — Output a parsed HTML5 document or node back to a string using standard serialization methods. ([source](https://nokogiri.org/tutorials/parsing_an_html5_document.html))
- [Streaming Parsers](https://awesome-repositories.com/f/data-databases/streaming-parsers.md) — Feeds XML or HTML4 data incrementally to a parser that emits events as each chunk is received. ([source](https://nokogiri.org/rdoc/index.html))

### Programming Languages & Runtimes

- [XML and HTML Document Parsers](https://awesome-repositories.com/f/programming-languages-runtimes/string-parsing/xml-and-html-document-parsers.md) — Read XML, HTML4, or HTML5 content from a string or file and build an in-memory tree for traversal and manipulation. ([source](https://nokogiri.org/rdoc/index.html))
- [Namespaced Queryers](https://awesome-repositories.com/f/programming-languages-runtimes/code-definition-namespaces/namespace-sharing/namespaced-queryers.md) — Associate a namespace prefix with a URL in a query to disambiguate elements that share the same local name. ([source](https://nokogiri.org/tutorials/searching_a_xml_html_document.html))
- [String Parsing](https://awesome-repositories.com/f/programming-languages-runtimes/string-parsing.md) — Parses HTML or XML documents directly from strings into traversable objects. ([source](https://nokogiri.org/tutorials/parsing_an_html_xml_document.html))
- [Node](https://awesome-repositories.com/f/programming-languages-runtimes/set-operations/node.md) — Union, intersect, or subtract node sets and filter them by selector. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))

### Content Management & Publishing

- [Node and Attribute Editors](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/markup-and-structure-parsers/html-document-transformation/node-and-attribute-editors.md) — Edit, add, or remove nodes in parsed XML and HTML documents to transform content programmatically. ([source](https://nokogiri.org/tutorials/toc.html))
- [Stream-Based Document Processors](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing/stream-based-document-processors.md) — Processes large XML and HTML4 documents incrementally with SAX or push parsers.

### Development Tools & Productivity

- [Node Set Filterers](https://awesome-repositories.com/f/development-tools-productivity/ast-transformation-tools/ast-node-interpolation/ast-node-collection-filtering/node-set-filterers.md) — Reduce a set of nodes to only those matching a given XPath or CSS expression. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [XML Escaping Setters](https://awesome-repositories.com/f/development-tools-productivity/build-tooling/build-orchestration-logic/build-system-extensibility/build-system-extensions/abstract-syntax-tree-transformers/content-node-transformers/xml-escaping-setters.md) — Sets node text content with automatic XML escaping for safe markup generation. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [Method-Based Node Navigators](https://awesome-repositories.com/f/development-tools-productivity/visual-to-code-sync-engines/code-to-graph-parsers/symbol-to-node-linking/dom-node-navigation/method-based-node-navigators.md) — Access child elements and attributes by calling their tag names as methods on the parent node. ([source](https://nokogiri.org/tutorials/searching_a_xml_html_document.html))

### Security & Cryptography

- [XML External Entity Prevention](https://awesome-repositories.com/f/security-cryptography/xml-external-entity-prevention.md) — Block external entity loading and network access by default to prevent XML external entity processing attacks. ([source](https://nokogiri.org/tutorials/parsing_an_html_xml_document.html))

### Software Engineering & Architecture

- [CSS Selector Engines](https://awesome-repositories.com/f/software-engineering-architecture/syntax-query-definitions/css-selector-engines.md) — Translates CSS3 selectors into XPath expressions for querying parsed documents.
- [Parse Error Reporters](https://awesome-repositories.com/f/software-engineering-architecture/error-handling/parse-error-reporters.md) — Captures and reports HTML5 parse errors during tokenization and tree construction with configurable limits. ([source](https://nokogiri.org/tutorials/parsing_an_html5_document.html))
- [SAX Parsers](https://awesome-repositories.com/f/software-engineering-architecture/event-driven-architectures/sax-parsers.md) — Processes XML or HTML4 as a stream of events without building a full tree for large files. ([source](https://nokogiri.org/rdoc/index.html))
- [Push Parsers](https://awesome-repositories.com/f/software-engineering-architecture/incremental-parsers/push-parsers.md) — Accepts data chunks incrementally and emits parse events as each fragment is processed.
- [DOM Child Insertions](https://awesome-repositories.com/f/software-engineering-architecture/linked-lists/node-insertion-techniques/dom-child-insertions.md) — Appends nodes as children or replaces all children with method chaining support. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [Node Reparenters](https://awesome-repositories.com/f/software-engineering-architecture/linked-lists/node-insertion-techniques/node-reparenters.md) — Reassigns node parents or inserts nodes adjacent to siblings to restructure document trees. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
- [DOM Node Removals](https://awesome-repositories.com/f/software-engineering-architecture/linked-lists/node-removal-techniques/dom-node-removals.md) — Unlinks nodes from their parent document, removing them from the DOM tree. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [DOM Node Swaps](https://awesome-repositories.com/f/software-engineering-architecture/linked-lists/node-swapping-strategies/dom-node-swaps.md) — Swaps a node with another node, fragment, or string and returns the replacement. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [UTF-8 Internal Storage](https://awesome-repositories.com/f/software-engineering-architecture/string-validation-and-normalization/string-encodings/utf-8-internal-storage.md) — Stores text internally as UTF-8 and returns markup in the source document's encoding. ([source](https://cdn.jsdelivr.net/gh/sparklemotion/nokogiri@main/README.md))
- [DOM Node Manipulators](https://awesome-repositories.com/f/software-engineering-architecture/trees/tree-node-reorderers/dom-node-manipulators.md) — Copies DOM nodes with optional depth parameter for shallow or deep cloning. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [DOM Node Reorderers](https://awesome-repositories.com/f/software-engineering-architecture/trees/tree-node-reorderers/dom-node-reorderers.md) — Inserts nodes before or after existing siblings to rearrange document element order. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))

### User Interface & Experience

- [Single Node Queries](https://awesome-repositories.com/f/user-interface-experience/component-querying/ast-node-querying/single-node-queries.md) — Return only the first matching node from a query, avoiding the need to access the first element of a result set. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [XML and HTML Element Constructors](https://awesome-repositories.com/f/user-interface-experience/programmatic-element-construction/xml-and-html-element-constructors.md) — Construct a fresh element node and insert it into the document at a chosen position. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
- [UTF-8 Text Retrievers](https://awesome-repositories.com/f/user-interface-experience/rich-text-editors/rich-text-formatting-extensions/node-content-formatting/utf-8-text-retrievers.md) — Retrieves node text content always as a UTF-8 encoded string. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [Element Tag Overrides](https://awesome-repositories.com/f/user-interface-experience/styling-theming-systems/content-styling/component-styling-tools/component-styling/element-tag-overrides/element-tag-overrides.md) — Provides the ability to dynamically change the underlying HTML tag of any element in a parsed document. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))

### Web Development

- [Markup Document Manipulators](https://awesome-repositories.com/f/web-development/document-format-converters/client-side-document-toolkits/markup-document-manipulators.md) — Modifies parsed XML and HTML documents by adding, removing, or altering nodes and attributes.
- [Node Creators](https://awesome-repositories.com/f/web-development/dom-element-manipulators/element-node-wrapping/node-creators.md) — Creates and inserts new element or text nodes into parsed documents at specified positions. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
- [DOM Node Positioning](https://awesome-repositories.com/f/web-development/dom-node-positioning.md) — Constructs new elements or text nodes and places them at specified positions in the document tree. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
- [Element Attributes](https://awesome-repositories.com/f/web-development/element-attributes.md) — Assign or update an attribute value on an element, such as a class or identifier. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
- [HTML and XML Serialization](https://awesome-repositories.com/f/web-development/html-and-xml-serialization.md) — Converts nodes or documents to HTML, XHTML, or XML strings with configurable encoding and indentation. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [XSLT Transformations](https://awesome-repositories.com/f/web-development/schema-validation/xml-schema-validations/xslt-transformations.md) — Applies XSLT stylesheets to XML documents for structural transformation and format conversion. ([source](https://nokogiri.org/rdoc/index.html))
- [Markup Document Parsers](https://awesome-repositories.com/f/web-development/url-data-parsing/markup-document-parsers.md) — Parses XML, HTML4, and HTML5 documents from strings or URLs into navigable trees. ([source](https://nokogiri.org/rdoc/index.html))
- [Element Node Wrapping](https://awesome-repositories.com/f/web-development/dom-element-manipulators/element-node-wrapping.md) — Wraps each node in a set with a specified outer element to restructure the document. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
- [Hash-Like Interfaces](https://awesome-repositories.com/f/web-development/element-attributes/command-attributes/hash-like-interfaces.md) — Provides a hash-like interface for reading, setting, and deleting node attributes. ([source](https://github.com/sparklemotion/nokogiri/wiki/Cheat-sheet))
- [Tag Renamers](https://awesome-repositories.com/f/web-development/element-attributes/tag-renamers.md) — Renames element tags and sets attribute values directly on parsed document nodes. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
- [XML Schema Validations](https://awesome-repositories.com/f/web-development/schema-validation/xml-schema-validations.md) — Validates XML document structure and content against XML Schema Definitions using libxml2. ([source](https://nokogiri.org/rdoc/index.html))
- [Remote Document Fetchers](https://awesome-repositories.com/f/web-development/url-data-parsing/remote-document-fetchers.md) — Fetches and parses HTML or XML documents from remote URLs in a single call. ([source](https://nokogiri.org/tutorials/parsing_an_html_xml_document.html))

### Artificial Intelligence & ML

- [Text Content Setters](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-indexing/element-modification/text-content-setters.md) — Ships methods to set text content on nodes with automatic escaping for valid markup. ([source](https://nokogiri.org/tutorials/modifying_an_html_xml_document.html))
