Gemma 3N
Google's Next-Generation Open Source AI

The most advanced lightweight multimodal AI model designed for mobile and edge devices

What is Gemma 3N?

Gemma 3N is Google's latest open-source multimodal AI model, specifically optimized for on-device applications. It supports image, audio, video, and text input/output, designed for real-time processing on smartphones, tablets, and edge devices.

E2B: 5B parameters → 2GB memory

E4B: 8B parameters → 3GB memory

140+ languages support

LMArena Score: 1300+ (E4B)

Download Now Try Online

🚀 Latest Release: Part of Gemma family with 1.6+ billion downloads worldwide

Why Choose Gemma 3N?

Revolutionary technical advantages that make it the most advanced mobile AI model

🎯 Core Advantages

Native Multimodal

Supports image, audio, video, text input/output natively

Two Model Sizes

E2B (5B→2GB) and E4B (8B→3GB) memory efficiency

Mobile-First

Supports Android, TPU, edge devices (Jetson etc.)

Mix-n-Match

Custom model slicing for specific hardware

Multimodal Support

Native support for image, audio, video, and text input/output. Process diverse content types seamlessly.

Efficient & Lightweight

Run 5B/8B parameter models with just 2GB/3GB memory. Optimized for mobile and edge devices.

Mobile-First Architecture

Supports Android, TPU, and edge devices (Jetson etc.) with optimized performance for mobile deployment.

Custom Model Slicing

Mix-n-Match architecture allows for flexible model configurations tailored to your specific needs.

High Accuracy

E4B model scores >1300 on LMArena (first in 10B+ category). Superior performance in real-world tasks.

Fast Response

KV Cache supports streaming input with 2x faster response times. Optimized for real-time applications.

Performance Benchmarks

Industry-leading performance metrics

1303

LMArena Elo Score

E4B Model

50.1%

WMT24++ Score

Multilingual ChrF

1.5x

Faster Response

On Mobile Devices

60fps

Video Processing

On Google Pixel

Core Technical Architecture

Revolutionary innovations that make Gemma 3N the most advanced mobile AI model

🧬 MatFormer Architecture

Matryoshka Transformer - like "Russian nesting dolls", one model contains multiple sub-models with nested inference capabilities.

Pre-extracted Models: Use separated E2B and E4B models

Mix-n-Match: Customize model size by adjusting hidden dimensions and layer skipping

Elastic Inference: Dynamic sub-model path switching during deployment

📦 PLE Technology

Per-Layer Embeddings - independent loading for each layer embedding, dramatically reducing memory pressure.

Core Transformer: Only core weights loaded into VRAM

CPU Computing: Other computations performed on CPU

Memory Efficiency: Significant VRAM usage reduction

🧠 KV Cache Sharing

Context cache sharing designed for audio/video long sequence input scenarios, significantly accelerating first response time.

Prefill Stage: 2x performance improvement

Long Sequences: Optimized for audio/video processing

Streaming: Faster initial response times

🎵 Audio Intelligence

Built-in speech understanding capabilities based on USM (Universal Speech Model) with fine-grained context awareness.

ASR: Automatic Speech Recognition

AST: Automatic Speech Translation

Languages: 140+ languages with enhanced Japanese, German, Korean, Spanish, French

Token Rate: One token per 160ms for fine-grained processing

👁️ Vision Processing: MobileNet-V5

Revolutionary MobileNet-V5-300M encoder designed for mobile-first vision understanding.

Native Resolutions

Low Resolution 256×256

Medium Resolution 512×512

High Resolution 768×768

Performance Improvements

Parameters: 46% reduction vs. SoViT

Memory: 4x less VRAM usage

Speed: 13x faster (with quantization)

Real-time: Up to 60fps on Google Pixel

Download & Deploy

Choose the Gemma 3N model version that fits your project needs

Google AI Studio

Try Online

No setup required

Instant access

Web interface

Try Now

HuggingFace

Model Hub

Complete model files

Multiple formats

Pre-trained weights

Download

Ollama

Local Deploy

One-click install

Local execution

CLI interface

ollama pull gemma:3n

Install Guide

Kaggle

Data Science

Research platform

Notebook ready

Community driven

Explore

Model Specifications

Gemma 3N E2B

Parameters

2GB

Memory Usage

Layers (6 blocks)

FFN Multiplier

Gemma 3N E4B

Parameters

3GB

Memory Usage

Layers (7 blocks)

FFN Multiplier

Platform Support

Hugging Face llama.cpp Ollama MLX Unsloth Android TPU Jetson

🧬 Core Technology Architecture

Revolutionary innovations that make Gemma 3N the most advanced mobile AI model

🏆 Performance Leadership

LMArena Elo Score Comparison - Gemma 3N E4B achieves 1303 score

Gemma 3N E4B scores 1303 on LMArena, ranking first among 10B+ parameter models and competing with much larger proprietary models like Gemini 1.5 Pro and GPT-4.1-nano.

🔧 MatFormer: Matryoshka Transformer

MatFormer Architecture Diagram - Showing flexible depth and FFN structure

🪆 Russian Doll Architecture

Nested Model Structure: One model contains multiple sub-models with different complexities
Flexible Depth: E2B (6 blocks) and E4B (7 blocks) with adaptive FFN multipliers
Dynamic Switching: Future support for elastic inference with runtime model selection

🎛️ Mix-n-Match Performance Optimization

📊 Scalable Performance

Custom Configuration: Adjust layers and parameters for specific hardware
Performance Range: MMLU scores from 50% to 62% across different configurations
Efficiency Sweet Spot: Optimal performance/parameter ratio at 2-4B effective parameters

MMLU Performance vs Model Sizes showing Mix-n-Match configurations

💾 PLE: Per-Layer Embeddings

PLE Loading Requirements comparison showing memory optimization

🧠 Memory Revolution

Dramatic Memory Reduction: From 5.44B to 1.91B parameters loaded (65% reduction)
Smart Caching: Only core Transformer weights in GPU memory, embeddings cached to fast storage
Optional Loading: Audio and vision parameters loaded only when needed

🚀 Innovation Summary

1303

LMArena Score

Best in class 10B+

65%

Memory Reduction

With PLE caching

7+6

Flexible Blocks

MatFormer architecture

Partner Ecosystem

Built in collaboration with industry leaders for comprehensive platform support

🏭 Hardware Partners

Qualcomm Technologies

Mobile chipset optimization

MediaTek

System-on-chip integration

Samsung System LSI

Mobile processor optimization

⚙️ Platform Partners

AMD

Axolotl

Docker

Hugging Face

llama.cpp

LMStudio

MLX

NVIDIA

Ollama

RedHat

SGLang

Unsloth

vLLM

Day One Launch

Gemma 3N launched with support across 13+ platforms, making it the most comprehensive model release to date.

Community & Resources

Join the growing community and explore additional resources

HuggingFace Hub

Access pre-trained models, code examples, and community contributions.

Explore Models →

Reddit Community

Join discussions, share experiences, and get help from the LocalLLaMA community.

Join Discussion →

OpenRouter API

Free API access to Gemma 3N E4B model for developers and researchers.

Get Free API →

Tech Analysis

In-depth technical analysis and insights from Simon Willison.

Read Analysis →

Getting Started

New to Gemma 3N? Start with our comprehensive getting started guide.

Get Started →

Frequently Asked Questions

Everything you need to know about Gemma 3N

Gemma 3N is Google's latest lightweight multimodal AI model designed for mobile and edge devices. It supports text, image, audio, and video input/output with exceptional efficiency and performance.

You can use Gemma 3N through multiple platforms including Ollama, HuggingFace, or Google AI Studio. The easiest method is using Ollama: install Ollama, then run ollama pull gemma:3n to download the model.

Yes, Gemma 3N is completely free to use. You can download the model for free, run it locally, or use Google AI Studio's free tier for online access.

Gemma 3N supports Windows, macOS, and Linux. The E2B model requires only 2GB RAM, while E4B needs 3GB RAM. It supports both CPU and GPU inference, with mobile-first optimization for Android and edge devices.

Gemma 3N Google's Next-Generation Open Source AI

What is Gemma 3N?

Why Choose Gemma 3N?

🎯 Core Advantages

Native Multimodal

Two Model Sizes

Mobile-First

Mix-n-Match

Multimodal Support

Efficient & Lightweight

Mobile-First Architecture

Custom Model Slicing

High Accuracy

Fast Response

Performance Benchmarks

Core Technical Architecture

🧬 MatFormer Architecture

📦 PLE Technology

🧠 KV Cache Sharing

🎵 Audio Intelligence

👁️ Vision Processing: MobileNet-V5

Native Resolutions

Performance Improvements

Download & Deploy

Google AI Studio

HuggingFace

Ollama

Kaggle

Model Specifications

Gemma 3N E2B

Gemma 3N E4B

Platform Support

🧬 Core Technology Architecture

🏆 Performance Leadership

🔧 MatFormer: Matryoshka Transformer

🪆 Russian Doll Architecture

🎛️ Mix-n-Match Performance Optimization

📊 Scalable Performance

💾 PLE: Per-Layer Embeddings

🧠 Memory Revolution

🚀 Innovation Summary

Partner Ecosystem

🏭 Hardware Partners

⚙️ Platform Partners

Community & Resources

HuggingFace Hub

Reddit Community

OpenRouter API

Tech Analysis

Getting Started

Frequently Asked Questions

Gemma 3N
Google's Next-Generation Open Source AI