Nokogiri is an XML and HTML parsing library that builds navigable document trees from strings, files, or URLs using native C parsers for speed and standards compliance. It provides a CSS selector engine that translates CSS3 selectors into XPath expressions for querying nodes, an XPath query interface with namespace support, a document manipulation toolkit for modifying parsed documents, XSD schema validation, and XSLT transformation capabilities.
The library wraps libxml2 and libxslt C libraries with Ruby bindings for high-performance parsing, and integrates Google's Gumbo parser for standards-compliant HTML5 parsing with error reporting. It supports multiple parsing approaches including SAX event-driven parsing for processing documents as a stream of events without building a full DOM tree, and a push parser that accepts data chunks incrementally. A builder DSL allows constructing XML or HTML documents programmatically using nested method calls that mirror the output structure.
Nokogiri offers comprehensive document manipulation capabilities including node creation, removal, replacement, cloning, and wrapping, along with attribute manipulation and text content modification with automatic XML escaping. It supports document parsing from strings, files, or URLs with explicit encoding declaration, and provides CSS and XPath querying for node selection. The library also includes namespace management, document serialization to HTML, XHTML, or XML, and stream-based processing for large documents.
The library is installable via pre-compiled native gems for various platforms, or can be built from source using system libraries, bundled copies, or custom library paths. It includes XXE protection by blocking external entity loading and network access by default, and provides a security vulnerability reporting process through HackerOne.