30 open-source projects similar to theseer/tokenizer, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Tokenizer alternative.
PHP-Parser is a tool that converts PHP source code into an abstract syntax tree for static analysis and programmatic manipulation. It functions as a parser, a code generator, and a static analysis framework. The project enables the programmatic construction of abstract syntax tree nodes through a fluent interface and provides the ability to transform these trees back into formatted source code. It includes a serializer that exports abstract syntax trees to JSON format and reconstructs them from strings. The toolset covers several capability areas, including namespace resolution, constant exp
xmltodict is a Python library that provides bidirectional serialization between XML documents and dictionaries. It functions as a parser that converts marked-up input into key-value pairs and a serialization utility that transforms dictionaries back into structured XML documents. The project includes an incremental stream processor that uses depth-based callbacks to handle large XML files while maintaining constant memory usage. It features a namespace manager for mapping prefixes and declarations, as well as a security sanitizer that blocks external entity expansion and validates element nam
This project is a Node.js library for bidirectional conversion between XML strings and JavaScript objects. It functions as an XML parser that transforms XML content into structured data and an XML serializer that generates formatted strings from JavaScript data objects. The toolkit includes a data transformer that applies custom processing functions to tags and attributes during the conversion process. It manages XML namespaces and supports the definition of custom root elements to maintain document structure during generation. The system handles XML data parsing, string generation, and name
ReflectionCommon is a PHP reflection interface library and code analysis abstraction. It serves as a foundation for static analysis by providing a shared specification for representing classes, methods, and properties during programmatic code inspection. The project standardizes the reflection API to decouple analysis tools from specific PHP reflection implementations. This ensures that different analysis implementations can work interchangeably through a consistent layer of interfaces. The library covers the domain of PHP code analysis and static analysis tooling, establishing a common way
php-code-coverage is a PHP library and analysis tool designed to track runtime execution and determine which parts of a codebase are exercised by automated tests. It monitors executed lines and branches during a test run to identify gaps in test coverage and evaluate the effectiveness of a test suite. The tool functions as an execution tracker and report generator that transforms raw PHP execution data into human-readable formats. It serializes collected metrics for storage and utilizes a processing system to calculate the total percentage of code covered. Its capability surface covers the e
phpDocumentor is a PHP API documentation generator and source code analyzer that transforms PHP files and DocBlocks into structured HTML API references. It functions as a static site generator and an automatic documentation tool designed to synchronize technical documentation with code changes. The project distinguishes itself by acting as a UML diagram generator, producing class and architectural graphs via PlantUML based on source analysis. It also supports technical manual authoring, rendering hand-written guides in Markdown and ReStructuredText alongside the automatically generated API re
php-token-stream is a lexical analysis tool and tokenizer wrapper for PHP. It functions as a source code streamer that reads tokens one by one, preventing the need to load entire source files into memory. The project provides memory-efficient parsing by wrapping the native PHP tokenizer extension. This allows for the sequential processing of source code tokens to analyze structural components and syntax. The tool is designed for static code analysis and the development of compiler tooling. It supports linear token processing and sequential traversal to examine language constructs and project
PhpInsights is a static analysis tool and code quality analyzer for PHP. It evaluates source code to identify bugs, style violations, and technical debt without executing the application. The tool functions as a complexity metric utility, calculating architectural and cyclomatic complexity to locate overly complicated logic. It measures overall software health and maintainability by comparing code against industry standards. The system manages technical debt through rule-based validation and metric-driven scoring. It uses a static analysis engine to parse source code, delivering the results
Larastan is a static analysis tool for PHP and a specialized extension for PHPStan. It serves as a code analyzer designed to detect bugs and architectural issues within Laravel applications by analyzing source code without executing it. The project provides framework-specific rule sets and specialized type-inference to handle the unique patterns and logic used in the Laravel ecosystem. This allows for more accurate error detection and type checking than generic analysis tools. The tool includes systems for managing legacy code debt through error baseline tracking and regex-based error suppre
Larastan is a static analysis extension and type inference engine for PHP designed to detect bugs and type errors in Laravel applications. It extends PHPStan to resolve framework-specific patterns and magic methods, providing a rule-based scanning engine to audit code quality without executing the application. The tool specializes in Eloquent analysis, verifying that model properties, casts, and relationships align with database schemas and migrations. It tracks types across Eloquent collections, custom builders, and model factories to ensure type safety during database operations and iterati
PHP-CS-Fixer is a static analysis tool and code style linter designed to validate PHP code against predefined standards. It functions as a coding standard fixer that automatically detects and corrects style violations to ensure consistent formatting across a codebase. The project serves as a syntax modernizer, providing automated tools to update legacy PHP syntax to align with newer language versions. It also allows for the creation of custom style rules when built-in standards do not meet specific requirements. The tool covers broad capability areas including automated linting workflows and
Acorn is a JavaScript parser that converts source text into a structured abstract syntax tree. It follows the ESTree specification to produce a standardized JSON tree format, enabling consistent analysis of code structure and language versions. The project features a plugin-based grammar extension system that allows the base parser to be extended with custom rules for experimental or non-standard language features. It also includes syntax error recovery, which inserts placeholder nodes into the tree when encountering invalid code to allow parsing to continue. The toolset covers static analys
bpmn-js is a browser-based BPMN 2.0 web modeler and rendering engine used for creating, editing, and visualizing business process models. It functions as an XML process modeler that parses BPMN 2.0 XML data into interactive visual diagrams within a web application. The project distinguishes itself as a business process visualizer with capabilities for process flow simulation, which tracks token movement to mimic real-time execution. It also supports diagram version comparison to identify changes between model iterations and provides a layered overlay interface for binding metadata and custom
go-ast-book is a collection of educational and technical resources focused on abstract syntax tree analysis, compiler development, and static code verification. It provides guides and manuals for parsing, traversing, and analyzing Go source code to extract semantic meaning. The project serves as a reference for building compiler frontends, covering the translation of high-level code into intermediate representations and single static assignment forms. It also provides instructions for using these techniques to develop language tooling and perform static code analysis. The resources cover a b
Sourcetrail is an interactive source code explorer and visualizer designed for indexing and navigating relationships between symbols and structures across large, multi-language codebases. It functions as a static analysis indexer and code dependency visualizer that maps calls and dependencies between source files to help reveal project architecture. The tool enables multi-language project analysis by using a language-agnostic indexing system to track symbols across different programming languages within a single interface. It allows for the discovery of software architecture and the explorati
Codegraph is a local codebase indexer and static analysis graph database that serves as a context provider for AI agents. It parses multiple programming languages into a searchable knowledge graph of symbols and dependencies, exposing these relationships to AI tools through the Model Context Protocol. The project distinguishes itself by aggregating relevant code snippets and symbol flows to reduce token usage for large language models. It automates the configuration of server settings and steering instructions across various AI agent platforms and command line editors to enable automatic code
Cppcheck is a static analysis tool and linter for C and C++ source code designed to detect programming errors, memory leaks, and security violations without executing the program. It functions as a bug detection engine and quality assurance tool to identify concurrency issues, type cast errors, and compliance with secure coding standards. The project provides a graphical user interface for selecting files and reviewing errors, alongside a linter for enforcing naming conventions and coding standards. It supports the creation of custom analysis rules using regular expressions to identify specif
This project is a regular expression lexer library and lexical analysis engine used to break input strings into typed token streams. It serves as a foundational component for constructing compilers or interpreters by identifying and categorizing substrings into discrete tokens. The library provides a token stream navigator featuring a cursor-based interface. This allows for sequential traversal of tokenized input and non-destructive lookahead, enabling the inspection of future tokens without advancing the internal position pointer. It includes specific support for recursive descent parsing t
Bhai-lang is a TypeScript-based toy programming language and custom syntax interpreter. It functions as an educational language implementation designed to demonstrate core concepts of variable management, conditional logic, and execution flow. The project provides a custom command line interface and an interactive code playground for writing and testing scripts. It serves as a framework for programming language prototyping, allowing for the definition of custom syntax and execution logic. The system covers the full interpreter pipeline, including lexical analysis, recursive descent parsing,
Esprima is a JavaScript parser that converts source code into a structured abstract syntax tree. It implements a specification-driven grammar to ensure compliance with ECMAScript standards, enabling the programmatic analysis and transformation of JavaScript programs. The project provides capabilities for lexical tokenization to break source code into individual symbols and static syntax validation to verify that scripts are well-formed without executing the code. Its functional surface covers JavaScript static analysis, lexical analysis, and the generation of abstract syntax trees.
This project is a C language interpreter and a practical implementation of a programming language. It parses and executes C source code directly, removing the requirement for a separate compilation step. The interpreter is designed for self-hosting, meaning it is capable of interpreting its own source code to demonstrate recursive language processing and execution. The system covers the primary stages of language processing, including lexical analysis, recursive descent parsing, and tree-walk interpretation using an abstract syntax tree. It manages memory and scope through a dynamic symbol t
ReflectionDocBlock is a PHP docblock parser and doc-comment metadata extractor. It functions as a reflection wrapper that extends the standard PHP Reflection API to convert raw documentation blocks into structured objects. The library provides tools for PHP documentation parsing and reflection tooling. It enables the extraction of structured metadata and annotations from reflection objects or raw doc-comment strings to support automated API documentation and static code analysis.
TypeResolver is a PHP namespace resolver and type parser designed to convert partial class and element names into fully qualified names. It functions as a utility for static code analysis, transforming complex type expressions and primitives into structured value objects. The project implements PSR-5 standards to ensure consistent type referencing. It manages the resolution of structural elements by tracking current namespaces and alias contexts to expand partial identifiers into their full definitions. The tool covers the parsing of compound type strings and the management of PHP imports an
vim-lsp is a Vim plugin that implements the Language Server Protocol to provide an asynchronous code intelligence tool for the editor. It serves as a bridge between Vim and external language servers, providing semantic code analysis and IDE-like navigation and diagnostics. The project provides a refactoring interface for renaming symbols across a workspace and applying quick-fixes. It also enables semantic highlighting, which color-codes elements based on their meaning as determined by the language server. The plugin covers a broad surface of capabilities, including symbol navigation and dis
The Rust RFCs repository is the formal home for the Rust language evolution process, housing the structured design documents and community review mechanisms that govern changes to the Rust programming language, its compiler, and its standard library. It defines the complete lifecycle for proposing, discussing, and implementing substantial changes through RFC documents, from initial submission and community feedback through final comment periods and sub-team sign-offs. The repository codifies the governance and collaboration processes that shape Rust's development, including mechanisms for com
PHP_CodeSniffer is a static analysis tool, coding standard linter, and command-line validator for PHP. It scans files and directories to detect and report formatting errors and language-specific coding violations without executing the code. The project functions as an automated code formatter capable of correcting detected style and formatting violations to bring source code into compliance with defined standards. It uses token-based lexical analysis to match code patterns against rule sets, ensuring consistency across a codebase. The tool provides comprehensive capabilities for recursive fi
SublimeCodeIntel is a code intelligence plugin for the Sublime Text editor that provides symbol-based navigation, autocomplete, and function tooltips. It functions as an IDE feature extension and static code analysis engine, using a cross-language symbol indexer to track definitions across multiple files. The system implements static analysis tooling to map definitions and references without executing program code. This enables users to jump to symbol definitions across an entire project and receive real-time suggestions for modules and symbols while typing. The toolset covers broad capabili
This is a Go library for reading and writing XLSX files, providing a toolkit for spreadsheet generation and data extraction. It functions as an Office Open XML parser and generator, enabling the creation of workbooks with support for styles, formulas, and metadata. The project features a data mapper that uses Go struct tags and reflection to automatically align spreadsheet rows with structured data. It also includes a validation engine for defining input constraints, such as dropdown lists and error alerts, to control user data entry. The library covers a broad range of capabilities, includi
pugixml is a lightweight C++ XML parser and DOM-based library used for parsing, manipulating, and saving XML documents. It provides a portable toolset for reading XML data from files, strings, or memory buffers and converting them into an in-memory document object model. The library includes a dedicated XPath 1.0 engine for extracting specific nodes and data through path expressions. It distinguishes itself through customizable memory management, allowing heap operations to be redirected to user-defined allocation functions, and the ability to perform in-place buffer parsing to reduce memory
This project is a PHP docblock annotation parser and reflection metadata tool designed to extract structured metadata from doc-comments and convert them into class instances. It functions as a system for retrieving and managing custom metadata attached to classes, methods, and properties. The library includes a metadata caching system to store parsed results, which reduces the performance overhead associated with repeated reflection calls and string parsing. It also serves as a static analysis utility for validating source code structure and enforcing coding standards through automated docblo