Why is cppformat/cppformat a recommended High-Performance Text Processing GitHub Repositories repository?

Provides a high-performance alternative to standard C++ I/O streams for converting data into strings.

Why is google/xi-editor a recommended High-Performance Text Processing GitHub Repositories repository?

Ensures high-performance editing of very large files with low latency using a rope data structure.

Why is google/re2 a recommended High-Performance Text Processing GitHub Repositories repository?

Ensures predictable execution time and memory usage when processing large volumes of text with regular expressions.

Why is bloopai/bloop a recommended High-Performance Text Processing GitHub Repositories repository?

Employs high-performance regular expression processing to rapidly filter and isolate specific text segments across large volumes of source code.

Why is onivim/oni2 a recommended High-Performance Text Processing GitHub Repositories repository?

Utilizes a high-performance environment optimized for the speed and efficiency of writing and modifying text files.

Why is jvns/pandas-cookbook a recommended High-Performance Text Processing GitHub Repositories repository?

Performs high-performance string operations to transform text data for analysis.

Why is lark-parser/lark a recommended High-Performance Text Processing GitHub Repositories repository?

Uses LALR algorithms to process large volumes of text with high efficiency and low memory usage.

Why is xtaci/algorithms a recommended High-Performance Text Processing GitHub Repositories repository?

Uses suffix trees and arrays for high-performance pattern matching and text analysis.

Why is zesterer/chumsky a recommended High-Performance Text Processing GitHub Repositories repository?

Enables high-performance text processing optimized for low memory and high throughput in resource-constrained environments.

Why is rust-lang/regex a recommended High-Performance Text Processing GitHub Repositories repository?

Provides high-performance text extraction with guaranteed linear time complexity to prevent performance crashes.

10 Repos

Awesome GitHub RepositoriesHigh-Performance Text Processing

Systems optimized for processing massive volumes of text with predictable memory and time complexity.

Distinct from Text Processing: Candidates focus on audio, collections, or AI inference; this is general-purpose high-performance text processing via regex.

Explore 10 awesome GitHub repositories matching data & databases · High-Performance Text Processing. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

cppformat/cppformat
cppformat/cppformat
23,626Auf GitHub ansehen
cppformat is a type-safe C++ formatting library that serves as a high-performance alternative to standard C++ input and output streams for converting data into formatted strings. It integrates a compile-time format validator to ensure format specifiers match argument types, preventing runtime crashes. The library includes a positional argument engine that enables the reordering of text arguments for internationalization and localization. It also features a Unicode text formatter to ensure consistent and portable character representation across different operating systems. The project provide
Provides a high-performance alternative to standard C++ I/O streams for converting data into strings.
C++
Auf GitHub ansehen23,626
google/xi-editor
google/xi-editor
19,816Auf GitHub ansehen
Xi Editor is a high-performance text editor core written in Rust. It employs a decoupled architecture that separates core logic from the presentation layer using a JSON-based client-server protocol. The project features a language-agnostic plugin system that communicates with external extensions via JSON messages over pipes. It manages text buffers using a persistent rope data structure to enable efficient editing of very large files. The system supports asynchronous editor workflows by running expensive operations in background threads using data snapshots. This prevents background processi
Ensures high-performance editing of very large files with low latency using a rope data structure.
Rust
Auf GitHub ansehen19,816
google/re2
google/re2
9,699Auf GitHub ansehen
re2 is a C++ regular expression library designed for high-performance text processing. It is a non-backtracking regex engine that provides linear-time pattern matching, ensuring that execution time remains proportional to the size of the input string regardless of the pattern used. The library supports UTF-8 and Latin-1 text encodings for searching and extracting substrings. It includes capabilities for multi-pattern optimization, allowing multiple regular expressions to be combined into a single representation to scan text for several patterns in one pass. The project covers core regex oper
Ensures predictable execution time and memory usage when processing large volumes of text with regular expressions.
C++
Auf GitHub ansehen9,699
bloopai/bloop
BloopAI/bloop
9,510Auf GitHub ansehen
Bloop is an AI code analysis tool and semantic search engine designed for understanding and querying large-scale codebases. It utilizes a high-performance indexing system written in Rust to enable fast symbol and text retrieval across multiple programming languages. The project differentiates itself by using on-device embeddings for semantic code search, allowing users to locate logic based on meaning and intent rather than exact keywords. It combines a language model with a retrieval-augmented generation approach to provide a natural language interface for conversational querying and the gen
Employs high-performance regular expression processing to rapidly filter and isolate specific text segments across large volumes of source code.
Rust
Auf GitHub ansehen9,510
onivim/oni2
onivim/oni2
7,854Auf GitHub ansehen
Oni2 is a high-performance, extensible text editor and project-based file manager. It functions as a modal code editor, utilizing a keyboard grammar of verbs and motions to navigate and modify source code without a mouse. It also serves as an LSP client, integrating Language Server Protocol servers to provide code completion, symbol navigation, and refactoring. The editor distinguishes itself by acting as a VSCode extension host, allowing it to load and execute language servers and debuggers from the VSCode ecosystem. It provides a programmable environment where custom functionality is implem
Utilizes a high-performance environment optimized for the speed and efficiency of writing and modifying text files.
Reason
Auf GitHub ansehen7,854
jvns/pandas-cookbook
jvns/pandas-cookbook
7,086Auf GitHub ansehen
Dieses Projekt ist ein pandas-Datenanalyse-Kochbuch und ein Python-Data-Science-Leitfaden. Es bietet eine Sammlung programmatischer Rezepte und Beispiele für das Bereinigen, Manipulieren und Analysieren strukturierter Daten. Das Projekt konzentriert sich auf die Bereitstellung einer containerisierten Analyseumgebung, um einen konsistenten Arbeitsbereich und reproduzierbare Abhängigkeiten bei der Ausführung von Datenverarbeitungsskripten zu gewährleisten. Es deckt ein breites Spektrum an Data-Science-Fähigkeiten ab, einschließlich Datenaufnahme aus externen Quellen, Rohdatenbereinigung und explorativer Datenanalyse. Diese Rezepte demonstrieren, wie strukturierte Datenanalyse durch Techniken wie Filtern, Aggregieren gruppierter Daten und die Verarbeitung von Textdaten durchgeführt wird.
Performs high-performance string operations to transform text data for analysis.
Jupyter Notebook
Auf GitHub ansehen7,086
lark-parser/lark
lark-parser/lark
5,914Auf GitHub ansehen
Lark ist ein Python-Parsing-Toolkit, das zum Definieren von Grammatiken und zum Konvertieren von Rohtext in annotierte Parse-Trees verwendet wird. Es dient als Abstract-Syntax-Tree-Generator und als Grammatik-Definitionssprache zur Spezifizierung von Sprachregeln durch Terminals und reguläre Ausdrücke. Die Bibliothek bietet zwei primäre Parsing-Implementierungen: eine Earley-Parsing-Bibliothek, die alle kontextfreien Sprachen verarbeiten kann (einschließlich solcher mit Mehrdeutigkeit und Linksrekursion), und eine performante LALR-Parsing-Bibliothek für deterministische Sprachen mit geringem Speicherbedarf. Über das reine Parsing hinaus enthält das Toolkit Funktionen für die modulare Grammatikkomposition, regelbasierte Baumtransformation und Koordinatenverfolgung für Quellpositionen. Es unterstützt zudem die Serialisierung von LALR-Grammatiken in eigenständige Parser-Module.
Uses LALR algorithms to process large volumes of text with high efficiency and low memory usage.
Pythoncykearleygrammar
Auf GitHub ansehen5,914
xtaci/algorithms
xtaci/algorithms
5,454Auf GitHub ansehen
This is a collection of classical algorithms and data structures implemented as a header-only C++ library. It provides a suite of tools for general algorithm implementation, including data structure management, graph theory analysis, and string processing. The library is distinguished by its specialized toolkits for cryptographic hashing and encoding, featuring implementations of MD5, SHA-1, and Base64. It also includes advanced capabilities for high-performance string processing via suffix trees and arrays, as well as computational number theory for primality testing and arbitrary-precision
Uses suffix trees and arrays for high-performance pattern matching and text analysis.
C++
Auf GitHub ansehen5,454
zesterer/chumsky
zesterer/chumsky
4,545Auf GitHub ansehen
Chumsky ist eine Parser-Kombinator-Bibliothek, die verwendet wird, um Hochleistungs-Parser durch die Komposition kleiner Parsing-Funktionen zu komplexen Grammatiken aufzubauen. Sie bietet mehrere Parsing-Engines, einschließlich rekursiver Abstieg- und Precedence-Climbing-Implementierungen zur Auflösung der Reihenfolge von Operationen in mathematischen und logischen Ausdrücken. Die Bibliothek zeichnet sich durch ihr Zero-Copy-Text-Parsing aus, das Speicherallokationen minimiert, um den Durchsatz zu erhöhen, sowie durch ihre Fähigkeit, ohne Standardbibliothek für den Einsatz in eingebetteten oder ressourcenbeschränkten Umgebungen zu laufen. Sie verfügt zudem über einen fehlerkorrigierenden Parser, der fehlerhafte Eingaben identifiziert und die Verarbeitung fortsetzt, um mehrere Syntaxfehler in einem einzigen Durchgang zu melden. Das Framework deckt ein breites Spektrum an Funktionen ab, einschließlich kontextsensitivem Zustandsmanagement, Unterstützung für rekursive Grammatiken und die Integration von Mustern regulärer Ausdrücke. Es enthält Tools für die Parser-Strukturanalyse, Knoteninspektion und Result-Caching zur Unterstützung von Backtracking und Linksrekursion. Die Bibliothek unterstützt die Entwicklung benutzerdefinierter Sprachen, das Parsen von Datenformaten und Programmiersprachen-Tooling.
Enables high-performance text processing optimized for low memory and high throughput in resource-constrained environments.
Rustcontext-free-grammarerrorslexing
Auf GitHub ansehen4,545
rust-lang/regex
rust-lang/regex
3,978Auf GitHub ansehen
This is a Rust regular expression library that provides a finite automata engine for searching and matching text patterns. It functions as a Unicode-compliant text scanner designed to guarantee linear time execution on all inputs to prevent catastrophic backtracking. The engine supports both single and multi-pattern search capabilities, allowing it to scan a piece of text for multiple regular expressions simultaneously. It operates on both strings and raw byte slices to identify matching text segments. The library covers text parsing, string validation, and pattern searching. It includes cap
Provides high-performance text extraction with guaranteed linear time complexity to prevent performance crashes.
Rustautomataautomatondfa
Auf GitHub ansehen3,978