30 open-source projects similar to leethomason/tinyxml2, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Tinyxml2 alternative.
pugixml is a lightweight C++ XML parser and DOM-based library used for parsing, manipulating, and saving XML documents. It provides a portable toolset for reading XML data from files, strings, or memory buffers and converting them into an in-memory document object model. The library includes a dedicated XPath 1.0 engine for extracting specific nodes and data through path expressions. It distinguishes itself through customizable memory management, allowing heap operations to be redirected to user-defined allocation functions, and the ability to perform in-place buffer parsing to reduce memory
This project is an HTML and XML DOM parser designed for loading and navigating the structure of web documents to extract specific data points. It functions as a web scraping utility that provides a system for locating precise elements using a CSS and XPath selector engine. The library includes a URI resolver that converts relative links found in documents into absolute addresses using a base URI. It provides a set of tools for retrieving text, attributes, and media sources from parsed content. The toolset covers document hierarchy traversal, selector-based filtering, and text extraction with
This project is a Node.js library for bidirectional conversion between XML strings and JavaScript objects. It functions as an XML parser that transforms XML content into structured data and an XML serializer that generates formatted strings from JavaScript data objects. The toolkit includes a data transformer that applies custom processing functions to tags and attributes during the conversion process. It manages XML namespaces and supports the definition of custom root elements to maintain document structure during generation. The system handles XML data parsing, string generation, and name
Nokogiri is an XML and HTML parsing library that builds navigable document trees from strings, files, or URLs using native C parsers for speed and standards compliance. It provides a CSS selector engine that translates CSS3 selectors into XPath expressions for querying nodes, an XPath query interface with namespace support, a document manipulation toolkit for modifying parsed documents, XSD schema validation, and XSLT transformation capabilities. The library wraps libxml2 and libxslt C libraries with Ruby bindings for high-performance parsing, and integrates Google's Gumbo parser for standard
This repository is a comprehensive collection of reference implementations and sample libraries for the Universal Windows Platform. It provides practical examples of how to use Windows Runtime APIs to build cross-device applications, including detailed guidance on XAML-based declarative user interfaces and DirectX-integrated rendering. The project distinguishes itself by providing a wide array of hardware integration suites, covering low-level communication with USB, Serial, I2C, SPI, and GPIO peripherals. It includes specialized implementations for mixed reality holographic rendering, advanc
pdfminer.six is a programmatic tool for extracting text, layout information, and metadata from PDF documents into machine-readable formats. It functions as a document parser that converts internal PDF objects and structures into accessible data objects for analysis. The project includes utilities for decrypting RC4 and AES encrypted files to enable content extraction. It also provides a layout analyzer to identify fonts, colors, and text locations to determine the organizational structure of pages. The system covers a broad range of extraction capabilities, including the retrieval of embedde
This repository contains the HTML specification, which defines the core standards for web page structuring, content organization, and document rendering. It establishes the fundamental algorithms for state-machine-based tokenization, tree construction for the document object model, and origin-based security isolation. The specification provides a framework for defining custom elements with independent lifecycles and registries. It also details the requirements for cross-document communication, session history management, and the synchronization of interface properties with content attributes.
Prawn is a Ruby library and document layout tool used for the programmatic generation of PDF files. It functions as a vector graphics engine that allows for the creation of portable documents containing formatted text, custom shapes, and organized page layouts. The library differentiates itself through a coordinate-based vector rendering system that supports multi-stop gradient fills, complex polygons, and layer-based blending. It provides a comprehensive typography system capable of embedding TrueType and OpenType fonts to support UTF-8 characters and right-to-left text for multilingual publ
This project is a technical educational guide focused on browser architecture and the internal processes used to render web pages. It provides a detailed breakdown of the web request lifecycle, from the initial networking phase to the final visual output on a screen. The guide covers specific technical sequences including the DNS resolution process across browser, operating system, and ISP caches, and the establishment of secure connections through the TLS handshake. It also details the communication flow between clients and servers using the HTTP protocol and server-side request handling. T
python-docx is an OOXML document manipulation library used for creating, reading, and updating Microsoft Word files. It functions as a generator for building formatted documents and a parser for extracting text, metadata, and structural elements from existing files. The project provides a comprehensive style management system for defining and applying character, paragraph, and table styles within OpenXML documents. It allows for the programmatic control of document appearance through an object-oriented approach to the underlying XML schema. Capabilities cover a wide range of document generat
This project is a formal markdown specification standard that provides a detailed markup syntax definition and a definitive set of rules for parsing plain text into consistent HTML output. It establishes a standardized grammar for structural blocks and inline elements to ensure uniform rendering across different software implementations. The specification is supported by a parser conformance suite and a reference implementation in C and JavaScript to verify that implementations adhere to the standard. It includes a system for implementation verification that compares transformed input strings
This project is an Open XML document library and generator used to programmatically create and modify Microsoft Office files. It enables the production of Word, Excel, and PowerPoint documents by manipulating the underlying Open XML structure. The library provides capabilities for Open XML document processing, including the automated modification of existing files and the generation of formatted office reports and spreadsheets for server-side production.
PHPWord is a PHP word processing library used for programmatically reading and writing word processing documents. It functions as an OOXML document generator, a word file parser, and a document template engine. The library enables the generation of new documents by applying structured data to existing templates or by creating files from scratch. It provides capabilities for extracting and parsing content, metadata, and structure from existing word processing files. The project covers a broad range of document generation features, including layout formatting, metadata management, and the inse
PyPDF2 is a pure Python library for reading, writing, and manipulating PDF files. It functions as a document manipulator, text extractor, and encryption tool, allowing users to process PDF files without relying on external C libraries or native binaries. The library provides specialized tools for modifying document structures, such as merging multiple files into one, splitting documents into separate files, and transforming page layouts through cropping. It also includes capabilities for securing documents via passwords and encryption. Additional capabilities include the extraction of writte
Kraken is a cross-platform UI framework and web standards runtime designed to build native applications using standard web markup and styling. It utilizes a Flutter-based rendering engine to process HTML and CSS, producing visually consistent user interfaces across mobile and desktop platforms. The system distinguishes itself by compiling the runtime to machine code and employing a synchronous rasterization pipeline to ensure animations and scrolling match the fluidity of native applications. It further integrates high-performance native components directly into a web-standard document object
parse5 is a WHATWG HTML parser and serializer for Node.js. It transforms HTML strings into a document object model and converts those trees back into valid HTML strings, following the logic defined by the HTML Living Standard. The project functions as a streaming HTML processor, using incremental parsing to handle large documents in chunks. It includes an HTML5 compliant tokenizer that uses a state-machine approach to break input into tokens according to official web specifications. The toolset covers HTML document parsing, serialization, and real-time rewriting via streams. These capabiliti
Jsoup is a Java library designed for parsing, extracting, and manipulating HTML and XML content. It provides a document object model that represents web content as a hierarchical tree, allowing for programmatic navigation and modification of elements, attributes, and text. The library functions as a toolkit for web scraping, enabling the retrieval of remote content via standard web protocols and the management of HTTP sessions for automated form interaction. The library distinguishes itself through its fault-tolerant tokenization, which reconstructs valid document structures from malformed or
Quill is a modular, web-based rich text editor designed for structured content authoring. It provides a comprehensive toolkit for building tailored editing experiences, allowing developers to manage document state, handle user input, and synchronize content through a predictable, serializable data model. The editor distinguishes itself through a custom document abstraction that maps the browser DOM to a structured tree of nodes, ensuring consistent behavior across different environments. It utilizes an operational change tracking system that represents all document modifications as a sequence
htmlparser2 is a collection of tools for high-performance markup parsing, DOM manipulation, and incremental stream processing. It functions as an HTML and XML parser that converts markup strings into structured object trees, alongside a streaming markup parser designed for memory-efficient processing of large documents. The project includes a DOM manipulation library for querying, modifying, and serializing document object model trees. It also provides a web feed parser to extract structured metadata and entries from RSS, RDF, and Atom feeds. The library covers broad capabilities in data par
Jodd is a suite of lightweight Java extensions and standard library utilities designed for application configuration, database mapping, dependency injection, and HTML parsing. It provides a consolidated set of core tools to facilitate Java development with a zero-dependency core to ensure compatibility and a small footprint across environments. The project features a pragmatic dependency injection container for managing object lifecycles and a database mapper that uses SQL templates to map result sets directly to Java objects. It includes a specialized configuration manager supporting profile
SwiftSoup is a cross-platform HTML processing library for Swift that converts raw HTML or XML strings and files into a structured document object model. It provides the core infrastructure to parse web content into a traversable tree, enabling programmatic access to page elements across iOS, macOS, and Linux. The library features a CSS selector engine for data extraction and a whitelist-based sanitization system to remove unsafe tags and attributes from user-submitted content. It optimizes repetitive document queries through memoized query caching. The project covers DOM manipulation for upd
KeeWeb is a web-based password manager and vault that allows users to open and edit encrypted databases through a browser interface. It functions as a cross-platform tool for managing password vaults using the KeePass database format. The application provides a self-hosted password vault that can be deployed as a single HTML file or via Docker. It integrates with remote storage providers using OAuth to synchronize encrypted database files across multiple devices. The system includes capabilities for secure credential generation, two-factor authentication management through time-based one-tim
Open-XML-SDK is a library for programmatically creating, modifying, and validating Office documents based on the Open XML standard. It functions as an office file generator and XML document parser, enabling the manipulation of word processing, spreadsheet, and presentation files. The library allows for the generation and updating of document content and structure without requiring the native office applications to be installed. It utilizes strongly typed classes and a schema-validated approach to ensure that created files remain compatible and correctly structured. The project provides capab
goquery is a Go HTML parsing library and CSS selector engine used to isolate and retrieve specific text or attributes from HTML documents. It functions as an HTML DOM manipulator that converts raw HTML strings into a structured tree for programmatic navigation and search. The library provides a fluent interface for chaining selection and filtering operations and utilizes a wrapper-based abstraction to simplify data extraction and manipulation of nodes. It employs an iterator-based processing mechanism to apply operations to every node within a matched selection. Its primary capabilities cove
This project is a Node.js web scraping framework designed to automate data extraction through a programmatic workflow of requests, parsing, and document interaction. It functions as a headless web crawler, an HTTP request manager, and a DOM parser and extractor. The framework distinguishes itself by combining a JavaScript execution engine to interact with dynamic content and a hybrid selection system that utilizes both CSS and XPath selectors. It includes specialized middleware for proxy rotation and cookie-jar session management to maintain authenticated states and manage automated traffic.
jsdom is a Node.js implementation of web standards that functions as a headless browser emulator. It provides a JavaScript execution environment and an HTML and XML parser to simulate a browser environment on the server side, implementing various web APIs and W3C standards. The project distinguishes itself by providing a sandboxed runtime for executing scripts embedded in HTML or external files. It includes specialized polyfills for the Canvas API and manages session state through HTTP cookie management. Its broader capabilities cover network interaction via request interception and resource
PyMuPDF is a comprehensive PDF manipulation library and document analysis tool. It serves as a text extraction tool, OCR engine, and image converter, providing a programmatic interface to edit, merge, split, and optimize PDF and Office documents. The project distinguishes itself through high-performance capabilities, including the use of C-bindings for low-level manipulation and parallelized page processing to accelerate workloads. It provides specialized conversion paths, such as transforming PDF content into Markdown for retrieval-augmented generation and large language model pipelines. It
PyPDF2 is a pure Python library for transforming, securing, and extracting data from PDF documents. It provides a comprehensive suite of tools to modify page layouts, manage document security, and retrieve embedded metadata without relying on external C libraries. The toolkit enables document assembly through the merging of multiple files and the splitting of documents into smaller parts. It also supports page-level transformations, including the ability to rotate pages and adjust visible crop areas. The library includes capabilities for security management via password-based encryption and
FreeCAD is an open-source engineering design suite designed for parametric 3D modeling, architectural planning, and mechanical assembly. It functions as a professional-grade platform that utilizes history-based operations to allow for non-destructive design updates, enabling users to construct complex geometry through a sequence of constrained sketches and solid operations. The platform distinguishes itself through a highly modular, workbench-based architecture that allows users to tailor the interface and toolsets to specific engineering domains. It features deep Python integration, which se