# theseer/tokenizer

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/theseer-tokenizer).**

5,194 stars · 23 forks · PHP · NOASSERTION

## Links

- GitHub: https://github.com/theseer/tokenizer
- awesome-repositories: https://awesome-repositories.com/repository/theseer-tokenizer.md

## Topics

`php` `tokenizer` `xml`

## Description

This library is a PHP source code tokenizer and static analysis tool that converts raw PHP code into discrete tokens and structured XML representations. It functions as a serializer that transforms token streams into a machine-readable format for programmatic analysis and source tree manipulation.

The project uses stream-based XML serialization and fragment-based buffer writing to maintain low memory overhead when processing large files. It allows for custom XML namespace configuration to ensure schema compatibility and avoid naming collisions during the transformation process.

The toolkit covers lexical analysis and the conversion of source code into structured XML to support static code analysis workflows. It processes token streams in a single pass to organize flat source lists into a hierarchical XML structure.

## Tags

### Programming Languages & Runtimes

- [PHP Code Analysis](https://awesome-repositories.com/f/programming-languages-runtimes/php-code-analysis.md) — Converts PHP source code into structured XML to enable programmatic analysis and manipulation of the code tree. ([source](https://github.com/theseer/tokenizer#readme))
- [Lexical Tokenizers](https://awesome-repositories.com/f/programming-languages-runtimes/lexical-tokenizers.md) — Utilizes the internal PHP tokenizer to break raw source code into a stream of discrete lexical tokens.
- [Source-to-XML Transformation](https://awesome-repositories.com/f/programming-languages-runtimes/php-code-analysis/source-to-xml-transformation.md) — Transforms tokenized PHP source code into a structured XML representation for automated analysis and downstream processing. ([source](https://github.com/theseer/tokenizer/blob/main/CHANGELOG.md))
- [Token Serialization](https://awesome-repositories.com/f/programming-languages-runtimes/php-code-analysis/token-serialization.md) — Transforms raw PHP tokens into a standardized XML format to preserve code structure for external processing tools.
- [PHP Parsers](https://awesome-repositories.com/f/programming-languages-runtimes/php-parsers.md) — Parses raw PHP source code into a sequence of tokens for programmatic analysis and transformation. ([source](https://github.com/theseer/tokenizer/blob/main/README.md))
- [PHP To XML Serializers](https://awesome-repositories.com/f/programming-languages-runtimes/php-to-xml-serializers.md) — Transforms PHP token streams into structured XML representations for easier source tree manipulation.
- [Source Code Tokenizers](https://awesome-repositories.com/f/programming-languages-runtimes/source-code-compilers/source-code-templates/source-code-transformation-engines/javascript-source-parsers/source-code-tokenizers.md) — Converts PHP source code into discrete tokens to enable programmatic analysis and processing.
- [PHP AST Parsers](https://awesome-repositories.com/f/programming-languages-runtimes/php-ast-parsers.md) — Prepares tokenized PHP code for further transformation by organizing it into an XML representation.

### Data & Databases

- [XML Serialization](https://awesome-repositories.com/f/data-databases/data-serialization-formats/xml-serialization-formats/xml-serialization.md) — Serializes a list of tokens into an XML representation that preserves the original source structure and content. ([source](https://github.com/theseer/tokenizer/blob/main/README.md))
- [Fragment Serialization](https://awesome-repositories.com/f/data-databases/data-serialization-formats/xml-serialization-formats/xml-serialization/fragment-serialization.md) — Appends serialized XML segments to a writer to improve performance and memory efficiency when processing large codebases. ([source](https://github.com/theseer/tokenizer/blob/main/CHANGELOG.md))
- [Stream-Based Serialization](https://awesome-repositories.com/f/data-databases/data-serialization-formats/xml-serialization-formats/xml-serialization/stream-based-serialization.md) — Writes XML fragments incrementally to a writer to maintain low memory overhead during large file processing.

### Development Tools & Productivity

- [Static Code Analysis](https://awesome-repositories.com/f/development-tools-productivity/code-quality-analysis/static-analysis-engines/static-code-analysis.md) — Turns PHP source files into a machine-readable XML format to simplify the identification of patterns within a codebase.
- [Source-to-Source Analysis](https://awesome-repositories.com/f/development-tools-productivity/code-quality-analysis/source-to-source-analysis.md) — Converts PHP source into XML to simplify the analysis and manipulation of the original source tree. ([source](https://github.com/theseer/tokenizer/blob/main/.gitignore))

### Software Engineering & Architecture

- [Single-Pass Tokenizers](https://awesome-repositories.com/f/software-engineering-architecture/stateless-architectures/stateless-token-validation/stateless-tokenizers/single-pass-tokenizers.md) — Uses a single-pass processing flow to convert flat token lists into a hierarchical XML structure.
