30 open-source projects similar to ahupp/python-magic, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Python Magic alternative.
file-type is a binary file type detector that identifies file extensions and MIME types by analyzing magic numbers and signature bytes in binary data. It functions as a magic number parser and MIME type resolver, mapping binary signatures to standardized media type strings. The project is an extensible file format identifier that allows for the addition of custom detector plugins to recognize uncommon or non-binary file formats. The engine supports binary format identification across various data sources, including buffers and data streams. It utilizes a supported format registry and provide
Tika is a content analysis toolkit and Java library designed for detecting and extracting metadata and text from thousands of different file types. It functions as a universal document text extractor and metadata extraction engine, converting complex files into plain text or XHTML. The system employs a specialized MIME type detector that identifies document formats using magic bytes and metadata to determine the correct parser. It serves as an OCR integration gateway, connecting to external text recognition tools to extract content from image files. The project covers a broad range of extrac
UniExtract2 is a suite of tools designed for universal archive extraction, batch decompression, and file format analysis. It retrieves files from various compressed formats, software installers, disk images, and game archives into local directories. The project includes a file format analyzer that identifies file types by scanning internal contents and headers without requiring full extraction. It also features an archive password decrypter that attempts to recover access to protected archives using a predefined list of common passwords. The tool supports bulk decompression workflows through
Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for browser-based processing. The system is designed to run entirely on self-hosted infrastructure, with no data leaving the user's environment. What distinguishes Kreuzberg is its breadth of integration surfaces and its pipeline architecture. It exposes extraction capabilities through native bindings fo
Masuit.Tools is a comprehensive static utility library for .NET and ASP.NET Core development. It provides a broad collection of reusable helper methods and infrastructure components that cover common programming tasks without requiring dependency injection or instance management. The library is organized as flat utility classes, making its functionality directly accessible from anywhere in a project. The toolkit distinguishes itself through a wide range of integrated capabilities that go beyond typical utility libraries. It includes a multithreaded range-request file downloader with pause and
MadelineProto is an asynchronous PHP library that provides a programmatic interface for interacting with the Telegram API using the MTProto protocol, the same protocol used by official Telegram clients. It functions as both a Telegram bot SDK and a userbot automation library, enabling PHP applications to connect to Telegram as either a bot account or a regular user account, sending and receiving messages, media, and other data directly without relying on the Bot API intermediary. The library is built on an event-driven architecture with Amp v3 fiber-based concurrency, allowing for non-blockin
fq is a command-line binary data processor used for decoding, transforming, and analyzing raw byte streams and bit-level data into structured formats. It functions as a functional binary query engine that allows for filtering and mapping binary structures, as well as a converter that translates complex binary blobs and proprietary file formats into standard JSON, YAML, or XML. The tool distinguishes itself as a low-level bit manipulator capable of performing bit-level slicing, bitwise operations, and cryptographic hashing on raw files. It also serves as a network protocol analyzer with the ab
ripgrep-all is a command-line utility that extends ripgrep to perform regular expression searches across binary files, compressed archives, and media formats. It functions as a universal text extractor that converts non-plain-text formats, such as PDFs, E-books, and Office documents, into searchable text. The tool uses a system of adapters to transform binary data into plain text and utilizes a local database to cache these extracted versions, accelerating repeated search operations. It identifies file types by analyzing header magic bytes rather than relying on file extensions. The project
ImHex is a professional-grade hex editor and binary data analysis platform designed for inspecting, modifying, and reverse engineering raw file contents. It functions as a schema-driven engine that interprets complex binary structures by applying custom definitions to map and visualize byte-level data. The platform distinguishes itself through a dedicated domain-specific language that allows users to define structural schemas for automated file parsing. This capability is supported by a dynamic plugin architecture and an event-driven registry, which enable the integration of external modules
Hexyl is a colored hex dump utility and binary data viewer for the terminal. It allows for the inspection of binary files by rendering contents as a colored hex dump to distinguish between different byte categories, such as printable text, whitespace, and null bytes. The tool includes a C-style hex exporter that transforms binary data into C include files for direct integration into source code. It supports visual layout customization through configurable panels and borders, as well as the ability to define colors for byte categories and offsets using terminal colors or RGB hex codes via envi
Detect-It-Easy is a binary file identifier and analysis toolkit designed to determine file formats, compilers, and packers. It functions as a binary file identifier that utilizes signature matching and heuristic analysis to identify executable and archive formats. The project includes a custom file signature engine and a scriptable rule system for defining and applying detection logic to identify specific binary patterns. It features specialized detectors for Android packages, such as APK and DEX files, and a malware packer detector to identify protections, obfuscators, and virus families. T
This project is a declarative framework and domain-specific language for managing Emacs Lisp packages. It functions as a startup performance optimizer by grouping package installation, variable settings, and keybindings into single blocks to reduce initial boot time. The system distinguishes itself through a deferred loading framework that delays package execution until specific keys, hooks, or modes are triggered. It uses a macro-based declaration syntax to organize configuration and automate the generation of autoloads, ensuring packages are only loaded when they are actually required. The
cloc is a codebase metrics tool and multi-language code analyzer designed to count blank lines, comment lines, and physical lines of code. It serves as a source code line counter and report generator that identifies file types to calculate source volume across a wide variety of programming languages. The tool distinguishes itself by providing codebase version comparison to measure relative changes in source and comment lines between two versions of a directory or archive. It also supports the definition of custom languages and the extension of language recognition by loading custom comment fi
Okio is a Java I/O library providing a set of tools for efficient byte-stream processing and file system operations. It functions as a buffered byte stream handler and streaming data transformer, utilizing a cross-platform file system API to manage data movement. The project is distinguished by its use of pooled mutable byte buffers that treat sequences as queues to reduce memory copying and garbage collection churn. It further decouples file operations from the host operating system through an abstraction-based file system, allowing for consistent path manipulation and atomic operations acro
This project is a collection of Go libraries designed to reduce the size of multiple web formats while preserving functional integrity. It serves as a high-performance text processor and multi-format asset compressor for shrinking HTML, CSS, JavaScript, JSON, SVG, and XML files by removing redundant characters. The tool is designed for both static batch processing and real-time use. It includes middleware capabilities to intercept and minify web responses on the fly based on MIME types or file extensions, allowing for content compression during active data streams. The processing suite cover
Typhoeus is a Ruby wrapper for libcurl that functions as a session-based HTTP client. It provides an interface for making both synchronous and asynchronous network requests. The project acts as a parallel request manager, using a managed queue to execute multiple network requests concurrently. It further distinguishes itself as a mocking tool for stubbing requests with predefined responses and as a caching layer that stores responses to avoid redundant network calls. The library covers a broad range of capabilities including session cookie management, response body streaming for large files,
Vulkan-Hpp is a header-only C++ binding library for the Vulkan graphics and compute API. It provides a type-safe wrapper around the Vulkan C API, allowing developers to interface with GPU hardware through a C++ interface that introduces no runtime CPU overhead. The library utilizes Resource Acquisition Is Initialization patterns to manage the lifecycle of Vulkan handles and objects, automating the release of GPU resources. It replaces C-style enumerations and bit-fields with strong typing and static type checking to catch invalid API parameter assignments during compilation. The project cove
dnSpy is a specialized suite of tools for the reverse engineering of .NET assemblies, functioning as a decompiler, assembly editor, and debugger. It translates compiled intermediate language back into high-level source code and provides an execution environment for stepping through compiled binaries to inspect runtime state without the original source files. The project includes a BAML decompiler that converts binary application markup language into a disassembled format and translates it into extensible markup language for user interface analysis. It also functions as a binary analysis tool
Remacs is a rewrite of the Emacs text editor implemented in Rust. It is a programmable and extensible text editor designed for improved memory safety and execution performance. The project includes a native interface that maps C library functions and structures into Rust to execute native logic. It uses native system APIs for cross-platform graphical interface rendering. The editor provides real-time text editing and supports the development of custom input methods and language dictionaries. The development process utilizes a containerized environment to ensure consistent build dependencies
Ranger is a keyboard-centric console file manager that provides a multi-column, text-based interface for navigating and organizing local file systems. It functions as a productivity tool designed to streamline command-line workflows by allowing users to perform standard file operations, such as copying, moving, and deleting, directly within a terminal environment. The project distinguishes itself through its extensible architecture and deep integration with the host shell. It supports custom plugin development and maintains context between sessions by syncing the working directory upon exit.
This project provides Rust bindings for the Dear ImGui library, enabling the creation of high-performance graphical user interfaces using an immediate-mode paradigm. By defining interface elements directly within the application render loop, it eliminates the need for persistent object hierarchies, allowing for rapid prototyping and dynamic visual updates. The library distinguishes itself by wrapping complex native function signatures in type-safe builders, which improves developer ergonomics while maintaining the performance characteristics of the underlying C++ implementation. It utilizes a
Zen-C is a multi-target systems language and source-to-source compiler that translates high-level logic into human-readable GNU C or C11 code. It functions as a JIT-enabled programming language with an in-process compiler for real-time interactive code evaluation and testing. The project serves as a CUDA GPU kernel generator, mapping specialized syntax to CUDA C++ using device attributes to target graphics hardware. It acts as an interoperability layer capable of emitting compatible code for C++, Objective-C, and Lisp to bridge native system frameworks and libraries. The language includes an
Wireshark is a network protocol analyzer and traffic inspector used for capturing and inspecting network traffic. It functions as a packet capture tool that intercepts live data from network interfaces and a TCP/IP dissector that decodes network protocol layers to translate raw binary packets into human-readable fields. The system provides capabilities for protocol stream reconstruction, grouping related packets into cohesive conversations between endpoints. It also operates as a packet file converter, allowing for the reading, modification, and conversion of network capture files across vari
This project is a collection of Python scripts and tools designed for web scraping, browser automation, and large-scale data extraction. It provides a set of implementations for retrieving information from websites and private APIs, including tools for multimedia downloading and social media data archiving. The toolset includes specialized mechanisms for bypassing anti-scraping measures through IP proxy pool rotation and multi-threaded crawlers. It also features capabilities for simulating browser sessions to handle authentication, intercepting session cookies, and decrypting network payloads
Textract is a multi-format text extraction tool and parser. It provides a unified interface to extract plain text from a variety of sources, including documents, images, and audio files. The system functions as a document content parser for PDFs and spreadsheets, an image text extractor using optical character recognition, and a speech-to-text transcriber for audio recordings.
An elixir framework to implement concurrent versions of common unix utilities, grep, find, etc..
Watchdog is a Python library and set of shell utilities for monitoring filesystem events. It provides a framework for tracking real-time changes to files and directories, mapping those events to configurable automation handlers, and executing system actions based on file creation, modification, or deletion. The project includes an event-driven shell utility for triggering custom scripts and commands automatically. It utilizes a configurable handler framework that allows users to associate specific filesystem events with specialized plugin logic defined in configuration files. The system moni
Datasets is a library designed for the management, processing, and sharing of large-scale data collections for machine learning workflows. It functions as both a data processing framework and a versioning platform, providing tools to organize, filter, and transform massive datasets while ensuring reproducibility across research and development teams. The library distinguishes itself by enabling the handling of datasets that exceed available system memory. It utilizes memory-mapped file access, disk-based caching, and lazy iterative streaming to maintain performance when working with large-sca