# google-deepmind/gemma

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/google-deepmind-gemma).**

5,475 stars · 973 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/google-deepmind/gemma
- Homepage: https://gemma-llm.readthedocs.io
- awesome-repositories: https://awesome-repositories.com/repository/google-deepmind-gemma.md

## Description

Gemma is a family of open-weights large language models based on a decoder-only transformer architecture. These models are designed for text generation and multi-modal conversations, capable of processing and generating responses based on both textual and visual input sequences.

The project provides a fine-tunable AI model that supports weight adjustment and low-rank adaptation to specialize performance for particular tasks. It includes support for quantized weights to reduce memory usage and increase inference speed on limited hardware.

The capability surface covers multi-modal AI integration, memory optimization through parameter sharding, and the integration of external tools and APIs to retrieve real-time data. It further enables the generation of images from text and the sampling of structured text outputs.

## Tags

### Artificial Intelligence & ML

- [Open Multimodal Model Deployers](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-models/multimodal-model-runners/open-multimodal-model-deployers.md) — Deploys a family of open and efficient multimodal models capable of text and image generation. ([source](https://github.com/google-deepmind/gemma#readme))
- [Open-Weights Models](https://awesome-repositories.com/f/artificial-intelligence-ml/open-weights-models.md) — Provides a family of large language models with publicly available weights for local or cloud execution.
- [Decoder Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/decoder-architectures.md) — Implements a transformer-based architecture utilizing causal attention mechanisms for autoregressive sequence generation.
- [LLM Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/full-parameter-fine-tuning/custom-data-fine-tunings/llm-fine-tuning.md) — Fine-tunes large language models on custom datasets using parameter-efficient methods like LoRA.
- [GPU Memory Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-memory-optimizers.md) — Optimizes VRAM usage for large models through quantization and parameter sharding to fit on limited GPUs.
- [Multi-Modal Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/multimodal-processing-tools/multi-modal-input-processors/multi-modal-inference-engines.md) — Provides an inference engine capable of processing and analyzing combined text and vision inputs within a single model. ([source](https://github.com/google-deepmind/gemma/blob/main/README.md))
- [Multi-modal Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-modal-language-models.md) — Provides a model capable of processing and generating responses from both textual and visual input sequences.
- [Cross-Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-modal-tokenizers/multi-modal-embedding-models/cross-attention-mechanisms.md) — Integrates visual and textual data by mapping different input modalities into a shared latent space for joint processing.
- [Model Deployments](https://awesome-repositories.com/f/artificial-intelligence-ml/open-weights-models/model-deployments.md) — Provides open weights that can be downloaded and deployed on local or cloud hardware.
- [External Tool Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/external-service-integrations/external-knowledge-integrators/external-tool-integrations.md) — Connects AI assistants to external utilities and APIs to perform actions and retrieve real-time data. ([source](https://github.com/google-deepmind/gemma/tree/main/colabs))
- [Text-to-Image Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-pipelines/text-to-image-generators.md) — Implements pipelines that generate high-resolution images from natural language text prompts. ([source](https://github.com/google-deepmind/gemma/blob/main/CHANGELOG.md))
- [Language Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-fine-tuning.md) — Provides frameworks and utilities for adjusting pre-trained language models using memory-efficient training methods. ([source](https://github.com/google-deepmind/gemma/blob/main/README.md))
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Provides procedures for adapting pre-trained models to specific datasets or tasks. ([source](https://github.com/google-deepmind/gemma#readme))
- [Low-Rank Adaptation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/low-rank-adaptation.md) — Supports training a small number of additional weight matrices while keeping base weights frozen to reduce hardware requirements.
- [Quantized Models](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/llm-quantization-frameworks/quantized-models.md) — Includes support for quantized weights to reduce memory usage and increase inference speed on limited hardware.
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Compresses model weights into lower-precision formats to reduce memory footprint and accelerate inference.

### Part of an Awesome List

- [Decoder-Only Architectures](https://awesome-repositories.com/f/awesome-lists/ai/llm-development/decoder-only-architectures.md) — Utilizes a decoder-only transformer architecture for autoregressive sequence generation.
- [Fine-tunable Models](https://awesome-repositories.com/f/awesome-lists/ai/model-training-and-fine-tuning/fine-tunable-models.md) — Supports weight adjustment and low-rank adaptation to specialize performance for particular tasks.
- [Multi-Modal AI](https://awesome-repositories.com/f/awesome-lists/ai/multi-modal-ai.md) — Processes and generates responses based on both textual and visual input sequences within a single workflow.
- [Tool Use And Integration](https://awesome-repositories.com/f/awesome-lists/ai/tool-use-and-integration.md) — Enables the model to call external functions and APIs to retrieve real-time data and perform actions.
- [Deep Learning Frameworks](https://awesome-repositories.com/f/awesome-lists/ai/deep-learning-frameworks.md) — Open-weights large language models based on advanced research.

### Networking & Communication

- [Distributed Parameter Sharding](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/model-parallelism-techniques/distributed-parameter-sharding.md) — Partitions large-scale model tensors across multiple compute nodes to facilitate parallel processing and overcome memory limits.
