# google/magika

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/google-magika).**

17,139 stars · 1,051 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/google/magika
- Homepage: https://securityresearch.google/magika/
- awesome-repositories: https://awesome-repositories.com/repository/google-magika.md

## Topics

`ai` `deep-learning` `filetype` `keras-classification-models` `keras-models` `mime-types` `onnx`

## Description

Magika is an AI content type classifier and MIME type prediction engine that uses deep learning to identify file formats based on binary data. It analyzes byte sequences through a neural network to predict the content type of a file and provide associated confidence scores.

The system features a foreign function interface that allows the core detection logic to be integrated across different programming languages. It includes a mechanism for configuring detection sensitivity and per-type thresholds to balance precision and recall.

The project provides capabilities for bulk file analysis via recursive directory scanning and security content inspection. It supports the loading of model assets from local paths or remote URLs and includes a utility to list all supported content type labels.

## Tags

### Data & Databases

- [Content Type Detection](https://awesome-repositories.com/f/data-databases/content-type-detection.md) — Identifies the content type of files using a deep learning model to return accurate MIME types and confidence scores. ([source](https://securityresearch.google/magika/))
- [Sensitivity Tuning](https://awesome-repositories.com/f/data-databases/content-type-detection/sensitivity-tuning.md) — Allows adjusting detection sensitivity and thresholds to optimize the identification of specific file formats.

### Artificial Intelligence & ML

- [Deep Learning Classifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-classifiers.md) — Uses a neural network to classify file types and predict MIME labels with associated confidence scores.
- [Prediction Thresholds](https://awesome-repositories.com/f/artificial-intelligence-ml/face-detection/confidence-filtering/prediction-thresholds.md) — Balances precision and recall by filtering model confidence scores against per-type minimum requirements.
- [Byte-Sequence Tensors](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-extraction-models/byte-sequence-tensors.md) — Implements a mechanism to convert raw file headers and binary content into numerical tensors for model processing.
- [Quantized Inference Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes.md) — Runs quantized machine learning models via the TFLite inference engine for efficient cross-platform deployment.

### Security & Cryptography

- [File Type Validators](https://awesome-repositories.com/f/security-cryptography/file-upload-security/file-type-validators.md) — Provides accurate MIME types and confidence scores by identifying the content type of files via deep learning.
- [Format Validation](https://awesome-repositories.com/f/security-cryptography/format-validation.md) — Analyzes unknown files to detect their true format for security scanning and data validation workflows.
- [Sensitivity Configurations](https://awesome-repositories.com/f/security-cryptography/security-detection-logic/sensitivity-configurations.md) — Provides mechanisms to control error tolerance through prediction modes and per-content-type thresholds. ([source](https://securityresearch.google/magika/introduction/overview/))

### Web Development

- [MIME Type Predictors](https://awesome-repositories.com/f/web-development/custom-content-negotiators/mime-type-mappings/dynamic-mime-type-resolvers/mime-type-predictors.md) — Predicts MIME types from file bytes using a deep learning model.

### Operating Systems & Systems Programming

- [Automated File Analysis](https://awesome-repositories.com/f/operating-systems-systems-programming/system-administration-maintenance/file-system-management/file-system-operations/automated-file-analysis.md) — Enables scanning of entire directory trees to determine the content types of many files simultaneously.

### Programming Languages & Runtimes

- [Foreign Function Interfaces](https://awesome-repositories.com/f/programming-languages-runtimes/language-interoperability/foreign-function-interfaces.md) — Provides a low-level C-API that enables the detection logic to be integrated into multiple high-level programming languages.
- [Language-Agnostic Detectors](https://awesome-repositories.com/f/programming-languages-runtimes/programming-language-varieties/programming-languages/cross-language-parsing-integrations/language-agnostic-detectors.md) — Offers a content identification tool with a foreign interface for integration across diverse programming environments.
