30 open-source projects similar to doctrine/lexer, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Lexer alternative.
PHP-Parser is a tool that converts PHP source code into an abstract syntax tree for static analysis and programmatic manipulation. It functions as a parser, a code generator, and a static analysis framework. The project enables the programmatic construction of abstract syntax tree nodes through a fluent interface and provides the ability to transform these trees back into formatted source code. It includes a serializer that exports abstract syntax trees to JSON format and reconstructs them from strings. The toolset covers several capability areas, including namespace resolution, constant exp
Acorn is a JavaScript parser that converts source text into a structured abstract syntax tree. It follows the ESTree specification to produce a standardized JSON tree format, enabling consistent analysis of code structure and language versions. The project features a plugin-based grammar extension system that allows the base parser to be extended with custom rules for experimental or non-standard language features. It also includes syntax error recovery, which inserts placeholder nodes into the tree when encountering invalid code to allow parsing to continue. The toolset covers static analys
This project is a C language interpreter and a practical implementation of a programming language. It parses and executes C source code directly, removing the requirement for a separate compilation step. The interpreter is designed for self-hosting, meaning it is capable of interpreting its own source code to demonstrate recursive language processing and execution. The system covers the primary stages of language processing, including lexical analysis, recursive descent parsing, and tree-walk interpretation using an abstract syntax tree. It manages memory and scope through a dynamic symbol t
Esprima is a JavaScript parser that converts source code into a structured abstract syntax tree. It implements a specification-driven grammar to ensure compliance with ECMAScript standards, enabling the programmatic analysis and transformation of JavaScript programs. The project provides capabilities for lexical tokenization to break source code into individual symbols and static syntax validation to verify that scripts are well-formed without executing the code. Its functional surface covers JavaScript static analysis, lexical analysis, and the generation of abstract syntax trees.
syn is a Rust syntax tree parser and token stream converter. It serves as a toolkit for procedural macro development, providing a framework to parse Rust source code into structured syntax trees for analysis and transformation. The project enables the manipulation of Rust abstract syntax trees through specialized visitor and folder patterns for traversing and mutating nodes. It provides a bidirectional mapping that allows developers to convert token streams into structured trees and print those trees back into tokens for code generation. The library covers a broad range of syntax analysis ca
php-token-stream is a lexical analysis tool and tokenizer wrapper for PHP. It functions as a source code streamer that reads tokens one by one, preventing the need to load entire source files into memory. The project provides memory-efficient parsing by wrapping the native PHP tokenizer extension. This allows for the sequential processing of source code tokens to analyze structural components and syntax. The tool is designed for static code analysis and the development of compiler tooling. It supports linear token processing and sequential traversal to examine language constructs and project
Mistune is a pure Python implementation of a Markdown to HTML parser. It functions as a library that converts Markdown formatted text into HTML markup for rendering in web browsers. The project is designed as an extensible Markdown renderer, utilizing a modular system that allows for the customization of how Markdown elements are transformed into HTML via a pluggable renderer. Its capabilities cover a range of conversion tasks, including static site generation, dynamic content rendering, and the creation of custom documentation workflows.
Parsedown is a PHP library that converts Markdown text and common extensions into structured HTML output for web browsers. It functions as a Markdown to HTML converter that processes both block-level and inline elements to generate valid web content. The library includes a PHP HTML sanitizer designed to escape HTML and scripting vectors. This security layer provides input sanitization to prevent attacks when processing untrusted user-generated content.
This project is an educational compiler implementation and a minimalist compiler construction tutorial. It serves as a practical example of how to build a functional compiler through a simplified end-to-end development process, transforming source code into executable instructions. The implementation is designed to teach the fundamentals of language implementation and compiler design. It focuses on the essential mechanics of transforming source code by demonstrating the core architecture of a translation process. The system covers the primary stages of compilation, including lexical analysis
ReflectionDocBlock is a PHP docblock parser and doc-comment metadata extractor. It functions as a reflection wrapper that extends the standard PHP Reflection API to convert raw documentation blocks into structured objects. The library provides tools for PHP documentation parsing and reflection tooling. It enables the extraction of structured metadata and annotations from reflection objects or raw doc-comment strings to support automated API documentation and static code analysis.
Chibicc is a C11 compiler designed as a reference implementation for studying compiler construction. It translates C source code into machine-specific assembly instructions by utilizing a pipeline that includes lexical analysis, recursive descent parsing, and single-pass code generation. The project serves as an educational tool for understanding the internal architecture of compilers, from initial tokenization to the final emission of machine code. The compiler distinguishes itself through its self-hosting capability, which allows the software to compile its own source code into a functional
This library is a PHP source code tokenizer and static analysis tool that converts raw PHP code into discrete tokens and structured XML representations. It functions as a serializer that transforms token streams into a machine-readable format for programmatic analysis and source tree manipulation. The project uses stream-based XML serialization and fragment-based buffer writing to maintain low memory overhead when processing large files. It allows for custom XML namespace configuration to ensure schema compatibility and avoid naming collisions during the transformation process. The toolkit c
Code-prettify is a browser-based tool and HTML syntax highlighter that adds visual formatting and line numbers to raw code blocks on web pages. It functions as a client-side code formatter and a customizable lexer library for defining language-specific highlighting rules. The system allows for the creation of custom lexers to provide syntax highlighting for proprietary or uncommon programming languages. Visual presentation is managed through custom code styling and the integration of external CSS stylesheets to define colors and fonts. The project provides automatic syntax highlighting and s
PHP_CodeSniffer is a static analysis tool, coding standard linter, and command-line validator for PHP. It scans files and directories to detect and report formatting errors and language-specific coding violations without executing the code. The project functions as an automated code formatter capable of correcting detected style and formatting violations to bring source code into compliance with defined standards. It uses token-based lexical analysis to match code patterns against rule sets, ensuring consistency across a codebase. The tool provides comprehensive capabilities for recursive fi
The Rust RFCs repository is the formal home for the Rust language evolution process, housing the structured design documents and community review mechanisms that govern changes to the Rust programming language, its compiler, and its standard library. It defines the complete lifecycle for proposing, discussing, and implementing substantial changes through RFC documents, from initial submission and community feedback through final comment periods and sub-team sign-offs. The repository codifies the governance and collaboration processes that shape Rust's development, including mechanisms for com
Cppcheck is a static analysis tool and linter for C and C++ source code designed to detect programming errors, memory leaks, and security violations without executing the program. It functions as a bug detection engine and quality assurance tool to identify concurrency issues, type cast errors, and compliance with secure coding standards. The project provides a graphical user interface for selecting files and reviewing errors, alongside a linter for enforcing naming conventions and coding standards. It supports the creation of custom analysis rules using regular expressions to identify specif
Fuse is a JavaScript fuzzy search library and client-side search engine designed to index and query JSON data. It provides utilities for approximate string matching and ranking results by relevance, allowing applications to perform fast filtering and searching of datasets without a dedicated backend. The library distinguishes itself through a token-based search implementation that supports word-order independence and relevance weighting. It utilizes edit-distance scoring to handle typos and insertions, and employs a system of field weighting to prioritize matches in high-value data keys. The
TabNine is an AI programming assistant and large language model completion tool that predicts and completes source code in real time. It functions as a language-aware code predictor, providing automated line completions and code snippets based on the context of the current file and project. The system utilizes custom language mapping and programming language tokenization to ensure suggestions remain syntax-accurate across various file extensions. By defining how source code is broken into symbols and identifiers, the tool maintains consistent suggestions across a project's different file type
Twig is a PHP template engine and compiled rendering library designed to separate business logic from presentation. It functions as a secure template language that generates HTML output by combining dynamic data with reusable layouts. The system emphasizes security through a focus on preventing cross-site scripting attacks via automatic output escaping and content sanitization. To ensure execution efficiency, it compiles templates into optimized PHP code and utilizes a caching mechanism to bypass parsing on subsequent requests. The engine provides comprehensive tools for template composition
CoffeeScript is a source-to-source transpiler that transforms a concise high-level syntax into standard JavaScript. It enables the development of logic for web applications and server-side environments by converting source code into a format compatible with browsers and server runtimes. The project provides a workflow for rapid prototyping and script execution automation, allowing users to run source files through a compiler and execute the resulting code immediately without a manual build step. The tooling leverages lexical analysis and abstract syntax tree transformations to manipulate cod
c4 is a minimalist C compiler and programming tool designed to translate C source code into executable machine code using a small set of functions. It functions as a stripped-down compilation utility focused on a tiny codebase. The project serves as an educational tool for studying the internal mechanics of the compilation process. It implements minimalist C compilation to demonstrate how source files are transformed into low-level binary executables. The compiler utilizes a single-pass compilation model with recursive descent parsing and direct-to-binary emission. It manages the translation
VBA-JSON is a library designed for parsing and serializing JSON data within Visual Basic for Applications environments. It functions as an office automation data library, enabling legacy Microsoft Office applications to process structured data and interact with modern web services. The tool converts raw JSON text into native objects and collections, allowing developers to access and manipulate data using standard indexing and iteration methods. It also performs the reverse operation, transforming native language structures into JSON-compliant strings for exchange with external systems. By ha
sqlglot is a SQL parser and transpiler that represents queries as abstract syntax trees to enable structural analysis, modification, and semantic transformation. It functions as a dialect translator and query optimizer, converting SQL code between different database engines and simplifying syntax trees through rule-based normalization. The project provides a framework for defining custom SQL dialects by overriding tokenizers, parsers, and generators. It includes a lineage analyzer to track data flow from source tables through complex queries to identify the origin of specific columns. Additi
This project is an educational compiler implementation and architecture demo. It serves as a small-scale C-style language compiler designed to demonstrate the fundamental stages of transforming source code into executable machine instructions. The codebase functions as a tool for compiler architecture education and design prototyping. It illustrates the process of building an educational language implementation to help users understand the mechanics of parsing and code generation. The implementation covers the primary stages of a compiler pipeline, including regular expression tokenization,
PegJS is a parsing expression grammar tool and JavaScript parser generator. It functions as a grammar compiler that transforms formal grammar specifications into executable JavaScript code for analyzing structured text and processing complex input strings. The system generates deterministic parsers that avoid the ambiguity of context-free grammars. It utilizes a packrat parsing model with memoization to ensure linear time complexity and employs recursive descent parsing to process input in a top-down hierarchical manner. The toolset supports the implementation of domain-specific languages an
yaml-cpp is a C++ library for parsing and emitting YAML 1.2 documents. It provides a complete YAML processing pipeline, from reading YAML content into a traversable node tree to writing in-memory data structures back as YAML text. The library represents parsed YAML as a mutable tree of typed nodes, supporting scalars, sequences, maps, and aliases. It uses a recursive-descent parser to build this node tree, and a stream-based emitter to generate YAML output incrementally. Template-based type conversion enables compile-time serialization between YAML nodes and C++ types, including support for c
Chumsky is a parser combinator library used to build high-performance parsers by composing small parsing functions into complex grammars. It provides multiple parsing engines, including recursive descent and precedence-climbing implementations for resolving the order of operations in mathematical and logical expressions. The library is distinguished by its zero-copy text parsing, which minimizes memory allocations to increase throughput, and its ability to run without a standard library for use in embedded or resource-constrained environments. It also features an error-recovering parser that
JSHint is a JavaScript static analysis tool and linter designed to detect errors and enforce coding standards. It functions as a syntax validator that scans source code to identify potential logic problems and programming mistakes before the code is executed. The tool provides a command line interface for analyzing files and directories. It supports the export of analysis results into standardized formats such as Checkstyle for integration with external build tools. Analysis is managed through a system of linting rule management and environment global configuration. This includes the ability
JSON5 is a parser and serializer for a human-readable configuration format that extends JSON. It serves as a JavaScript-based data parser that allows for a more flexible version of the JSON specification to simplify manual editing of data files. The project provides capabilities to support comments, trailing commas, and multi-line strings. It includes utilities to convert this extended syntax into standard JSON for compatibility with tools requiring strict specifications. The library covers data serialization, string parsing, and structural syntax validation. It also provides integration for