13 dépôts
Utilities for parsing, extracting, and manipulating document formats.
Explore 13 awesome GitHub repositories matching part of an awesome list · Document Processing. Refine with filters or upvote what's useful.
SheetJS is a comprehensive library for parsing, manipulating, and generating complex spreadsheet file formats. It functions as a universal data processor that maps diverse binary, XML, and text-based file structures into a unified internal object model, allowing developers to create, read, and transform workbook data programmatically. The library distinguishes itself through a portable logic layer that provides a consistent execution environment across web browsers, server-side runtimes, and native desktop or mobile applications. By utilizing stream-based processing, it handles large files in
Spreadsheet data processing and manipulation toolkit.
jsPDF is a JavaScript PDF generation library and client-side engine that produces documents directly on the user's device. It provides a scriptable interface for creating PDF files within web browsers and other JavaScript runtime environments without requiring a backend server. The library includes a tool for defining document dimensions, orientation, and measurement units to control page layout. It also functions as a Unicode font integrator, allowing for the embedding of custom font files to support diverse languages and special characters. Capability areas cover dynamic document automatio
Client-side generation of PDF files.
ExcelJS is a Node.js spreadsheet engine and manipulation library used for reading, writing, and modifying XLSX and CSV files. It functions as a formatting tool and asynchronous streaming parser for generating complex workbooks containing formulas, rich text, and custom styles. The library is distinguished by its ability to process large datasets using asynchronous data streaming and incremental processing, which minimizes memory usage during data extraction and file generation. Its capability surface covers comprehensive data management, including structured tables, named ranges, and cell da
Comprehensive management and manipulation of Excel worksheets.
pdfkit is a JavaScript PDF generation library used to programmatically create binary PDF documents in Node.js and browser environments. It functions as a vector graphics engine for rendering paths, shapes, gradients, and tiling patterns, and as a tool for producing rich text and tagged documents that follow international accessibility standards for screen reader compatibility. The library includes a security and encryption utility for applying document encryption and restricting user permissions regarding printing, copying, or editing. It also serves as a form and annotation tool, enabling th
Cross-environment PDF document generation library.
nodeppt is a markdown presentation generator and static site generator that transforms markdown source files into interactive web-based slide decks. It consists of a command-line build tool and a specialized frontend runtime used to deliver presentations in a web browser. The project features a dual-screen presentation runtime that synchronizes the audience view with a private speaker notes monitor. It employs a plugin-based markdown pipeline and a post-processing DOM transformation system to convert custom syntax into structured HTML content. The framework supports technical content generat
Web-based presentation and slideshow tool.
pypdf is a Python library for parsing, manipulating, and generating PDF documents. It provides high-level operations for document processing, such as merging multiple files into one or splitting a single document into smaller files. The project includes specialized tools for managing interactive elements, including the creation and modification of annotations, hyperlinks, and form fields. It also supports advanced metadata management, allowing for the extraction and modification of standard document properties and XML-based XMP metadata. Beyond basic structural changes, the library covers pa
Divides single PDF files into smaller documents by extracting specific page ranges.
pdf-lib est une bibliothèque de manipulation PDF en JavaScript utilisée pour créer, modifier et éditer des documents PDF par programmation. Elle fonctionne comme un outil multi-runtime compatible avec Node, Browser, Deno et les environnements JavaScript mobiles. La bibliothèque fournit une interface programmatique pour l'édition de documents et la génération de formulaires. Elle prend en charge la création de formulaires PDF interactifs, le remplissage de champs existants avec des données personnalisées et l'aplatissement des formulaires en contenu statique. Ses capacités plus larges incluent la génération de nouveaux documents à partir de zéro, la réorganisation ou la copie de pages entre fichiers et la gestion des métadonnées de documents. Elle permet également de dessiner du contenu visuel tel que du texte, des images et des graphiques vectoriels, ainsi que d'incorporer des polices personnalisées et de joindre des fichiers externes.
Creation and modification of PDF documents.
Conversion of Word documents into clean HTML.
docx est une bibliothèque JavaScript et TypeScript pour la génération et la manipulation programmatique de documents Word. Elle sert de générateur de documents OOXML, permettant aux développeurs de créer des fichiers bureautiques formatés via du code au lieu d'une édition manuelle. La bibliothèque permet l'automatisation de documents à travers les environnements Node.js et navigateur web. Elle prend en charge l'exportation de documents côté client, permettant aux utilisateurs de générer et de télécharger des fichiers directement dans le navigateur sans serveur backend. Les capacités incluent la possibilité de définir des mises en page, des marges et l'orientation. Les utilisateurs peuvent insérer par programmation des éléments de document tels que du texte, des listes, des tableaux et des images pour construire des structures de document personnalisées et des rapports automatisés.
API-driven generation of Word documents.
Percollate est un outil en ligne de commande pour convertir des pages web et des flux RSS en fichiers structurés. Il fonctionne comme un convertisseur de contenu web, un générateur de documents statiques et un bundler de pages qui transforme le contenu en ligne en formats PDF, EPUB, HTML ou Markdown. L'outil crée des documents autonomes en intégrant des images externes sous forme d'URL de données encodées et en appliquant des modèles HTML et des feuilles de style CSS personnalisés. Il peut combiner plusieurs URL web ou entrées de flux en un seul livre numérique doté d'une table des matières générée et d'un index hyperlié. Les capacités supplémentaires incluent la décomposition des flux Atom et RSS en articles individuels et la planification séquentielle des requêtes pour gérer le rythme du trafic lors de la récupération de contenu depuis des serveurs.
CLI tool for converting web pages into PDF or EPUB.
Creating Office Open XML files (Word, Excel and Powerpoint) for Microsoft Office 2007 and later without external tools, just pure Javascript. officegen should work on any environment that supports Node.js including Linux, OSX and Windows. officegen also supporting PowerPoint native charts…
Stream-based generation of Word, PowerPoint, and Excel documents.
pdf2json is a node.js module that converts binary PDF to JSON and text. Built with pdf.js, it extracts text content and interactive form elements for server-side processing and command-line use.
Parsing PDF binary files into structured JSON.
Excel XLSX parser/generator written in JavaScript with Node.js and browser support, jQuery/d3-style method chaining, encryption, and a focus on keeping existing workbook features and styles in tact.
Generation and parsing of Excel XLSX files.