# doctrine/lexer

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/doctrine-lexer).**

11,156 stars · 62 forks · PHP · MIT

## Links

- GitHub: https://github.com/doctrine/lexer
- Homepage: https://www.doctrine-project.org/projects/lexer.html
- awesome-repositories: https://awesome-repositories.com/repository/doctrine-lexer.md

## Description

This project is a regular expression lexer library and lexical analysis engine used to break input strings into typed token streams. It serves as a foundational component for constructing compilers or interpreters by identifying and categorizing substrings into discrete tokens.

The library provides a token stream navigator featuring a cursor-based interface. This allows for sequential traversal of tokenized input and non-destructive lookahead, enabling the inspection of future tokens without advancing the internal position pointer.

It includes specific support for recursive descent parsing through iterator-based interfaces that facilitate the entry and exit of nested grammar rules. Additional capabilities include pattern-based tokenization and the ability to skip tokens until a specified type is encountered.

## Tags

### Programming Languages & Runtimes

- [Tokenizers](https://awesome-repositories.com/f/programming-languages-runtimes/regular-expression-engines/tokenizers.md) — Uses regular expression patterns to break raw text into a sequence of typed tokens.
- [Lexical Tokenizers](https://awesome-repositories.com/f/programming-languages-runtimes/lexical-tokenizers.md) — Acts as a system for converting raw source text into a sequential stream of tokens for parsing.
- [Lookahead Parsing](https://awesome-repositories.com/f/programming-languages-runtimes/lookahead-parsing.md) — Implements parsing techniques that inspect future tokens without consuming them to resolve ambiguities.
- [Pattern-Based Token Categorization](https://awesome-repositories.com/f/programming-languages-runtimes/pattern-based-token-categorization.md) — Assigns discrete identifiers to substrings by comparing input text against a set of catchable and non-catchable rules.
- [Regular Expression Lexer Libraries](https://awesome-repositories.com/f/programming-languages-runtimes/regular-expression-lexer-libraries.md) — Breaks input strings into typed token streams using regular expression patterns for lexical analysis.
- [Sequential Token Traversal](https://awesome-repositories.com/f/programming-languages-runtimes/sequential-token-traversal.md) — Moves through tokenized input one token at a time, inspecting each token's type, value, and position. ([source](https://www.doctrine-project.org/projects/doctrine-lexer/en/latest/index.html))
- [Token Stream Navigation](https://awesome-repositories.com/f/programming-languages-runtimes/token-stream-navigation.md) — Walks through tokenized input using a cursor and non-destructive lookahead to determine string structure.
- [Token Stream Navigators](https://awesome-repositories.com/f/programming-languages-runtimes/token-stream-navigators.md) — Provides a cursor-based interface for walking through tokenized input with support for pointer resets.
- [Custom Language Lexing](https://awesome-repositories.com/f/programming-languages-runtimes/custom-language-lexing.md) — Converts raw text into categorized token streams using regular expressions for use in compilers or interpreters.
- [Recursive Descent Support](https://awesome-repositories.com/f/programming-languages-runtimes/recursive-descent-support.md) — Offers iterator-based interfaces to help a parser handle nested grammar rules based on token types.

### Software Engineering & Architecture

- [Linear Tokenizers](https://awesome-repositories.com/f/software-engineering-architecture/raw-token-accessors/linear-tokenizers.md) — Converts raw character streams into sequential token lists for subsequent grammar analysis.

### Data & Databases

- [Recursive Descent Parsers](https://awesome-repositories.com/f/data-databases/data-transformation-functions/recursive-processors/recursive-logic-implementations/recursive-descent-parsers.md) — Provides the tokenization foundation necessary for implementing top-down recursive descent parsers.
