Meta
Skip to main content
Meta
Models & Products
Docs
Community
Resources
Llama API
Download models

Meta
FacebookXYouTubeLinkedIn
Documentation
OverviewModels Getting the Models Running Llama How-To Guides Integration Guides Community Support

Community
Community StoriesOpen Innovation AI Research CommunityLlama Impact Grants

Resources
CookbookCase studiesVideosAI at Meta BlogMeta NewsroomFAQPrivacy PolicyTermsCookies

Llama Protections
OverviewLlama Defenders ProgramDeveloper Use Guide

Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Llama banner image

Llama 4: Leading intelligence.

Unrivaled speed and efficiency.

The most intelligent, scalable, and convenient generation of Llama is here: natively multimodal, mixture-of-experts models, advanced reasoning, and industry-leading context windows. Build your greatest ideas and seamlessly deploy in minutes with Llama API and Llama Stack.
Download models
Join the Llama API Waitlist
Horizon banner image

Relive LlamaCon: Watch the event and explore the latest developments in AI innovation.

Watch now

Build with Llama 4

We've optimized models for easy deployment, cost efficiency, and performance that scales to billions of users. We can’t wait to see what you build.
Llama API
Go from ideation to app deployment in minutes. Experience a seamless and efficient way to build AI apps using Llama models. Learn more.
Join waitlist
Llama 4 Scout
Class-leading natively multimodal model that offers superior text and visual intelligence, single H100 GPU efficiency, and a 10M context window for seamless long document analysis.
Download
Llama 4 Maverick
Industry-leading natively multimodal model for image and text understanding with groundbreaking intelligence and fast responses at a low cost.
Download
Llama 4 Behemoth Preview
An early preview (it’s still training!) of the Llama 4 teacher model used to distill Llama 4 Scout and Llama 4 Maverick. Learn more about it.
View blog
Llama banner image

Top performance at lowest cost

Llama 4 Maverick surpasses similar models in its class, offering developers top performance at unbeatable value.
Top performance at lowest cost

Llama 4 Capabilities

Llama 4 Behemoth, Llama 4 Scout and Llama 4 Maverick offer class-leading capabilities.
Start building with Llama 4
Natively Multimodal
Unparalleled Long Context
Expert Image Grounding
Multilingual Writing

Natively Multimodal
Natively Multimodal
All Llama 4 models are designed with native multimodality, leveraging early fusion that allows us to pre-train the model with large amounts of unlabeled text and vision tokens - a step change in intelligence from separate, frozen multimodal weights.

Unparalleled Long Context
Unparalleled Long Context
Llama 4 Scout supports up to 10M tokens of context - the longest context length available in the industry - unlocking new use cases around memory, personalization, and multi-modal applications.

Expert Image Grounding
Expert Image Grounding
Llama 4 is also best-in-class on image grounding, able to align user prompts with relevant visual concepts and anchor model responses to regions in the image.

Multilingual Writing
Multilingual Writing
Llama 4 was also pre-trained and fine-tuned for unrivaled text understanding across 12 languages, supporting global development and deployment.
Natively Multimodal
All Llama 4 models are designed with native multimodality, leveraging early fusion that allows us to pre-train the model with large amounts of unlabeled text and vision tokens - a step change in intelligence from separate, frozen multimodal weights.

Benchmarks

We evaluated model performance on a suite of common benchmarks across a wide range of languages, testing for coding, reasoning, knowledge, vision understanding, multilinguality, and long context.
Llama 4 Maverick
Llama 4 Scout
Llama 4 Behemoth
Category
Benchmark

Inference Cost

Cost per 1M input & output tokens (3:1 blended)

Image Reasoning

MMMU
MathVista

Image Understanding

ChartQA
DocVQA
(test)

Coding

LiveCodeBench
(10/01/2024-02/01/2025)

Reasoning & Knowledge

MMLU Pro
GPQA Diamond

Multilingual

Multilingual MMLU

Long context

MTOB (half book)
eng->kgv/kgv->eng
MTOB (full book)
eng->kgv/kgv->eng

Llama 4 Maverick

$0.19-$0.49⁵

73.4

73.7

90.0

94.4

43.4

80.5

69.8

84.6

54.0/46.4

50.8/46.7

Gemini 2.0 Flash

$0.17

71.7

73.1

88.3

-

34.5

77.6

60.1

-

48.4/39.80⁴

45.5/39.6⁴

DeepSeek v3.1

$0.48

No multimodal support

45.8/49.2³

81.2

68.4

-

Context window is 128K

GPT-4o

$4.38

69.1

63.8

85.7

92.8

32.3³

-

53.6

81.5

Context window is 128K

  1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.

  2. For non-Llama models, we source the highest available self-reported eval results, unless otherwise specified. We only include evals from models that have reproducible evals (via API or open weights), and we only include non-thinking models. Cost estimates are sourced from Artificial Analysis for non-llama models.

  3. DeepSeek v3.1's date range is unknown (49.2), so we provide our internal result (45.8) on the defined date range. Results for GPT-4o are sourced from the LCB leaderboard.

  4. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.

  5. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended)

Llama 4 Maverick
Llama 4 Scout
Llama 4 Behemoth
Category
Benchmark

Inference Cost

Cost per 1M input & output tokens (3:1 blended)

Image Reasoning

MMMU
MathVista

Image Understanding

ChartQA
DocVQA
(test)

Coding

LiveCodeBench
(10/01/2024-02/01/2025)

Reasoning & Knowledge

MMLU Pro
GPQA Diamond

Multilingual

Multilingual MMLU

Long context

MTOB (half book)
eng->kgv/kgv->eng
MTOB (full book)
eng->kgv/kgv->eng

Llama 4 Maverick

$0.19-$0.49⁵

73.4

73.7

90.0

94.4

43.4

80.5

69.8

84.6

54.0/46.4

50.8/46.7

Gemini 2.0 Flash

$0.17

71.7

73.1

88.3

-

34.5

77.6

60.1

-

48.4/39.80⁴

45.5/39.6⁴

DeepSeek v3.1

$0.48

No multimodal support

45.8/49.2³

81.2

68.4

-

Context window is 128K

GPT-4o

$4.38

69.1

63.8

85.7

92.8

32.3³

-

53.6

81.5

Context window is 128K

  1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.

  2. For non-Llama models, we source the highest available self-reported eval results, unless otherwise specified. We only include evals from models that have reproducible evals (via API or open weights), and we only include non-thinking models. Cost estimates are sourced from Artificial Analysis for non-llama models.

  3. DeepSeek v3.1's date range is unknown (49.2), so we provide our internal result (45.8) on the defined date range. Results for GPT-4o are sourced from the LCB leaderboard.

  4. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.

  5. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended)


Resources
Explore the latest tools, documentation, and best practices as you build with Llama.
placeholder-image

Docs

The guides and resources you need to build with Llama 4.
Docs
placeholder-image

Cookbooks

Check out our collection of Llama recipes to help you get started faster.
Cookbooks
placeholder-image

Case studies

See how other innovators are building with Llama.
Learn more

Our partner ecosystem

Partner logo collagePartner logo collagePartner logo collagePartner logo collagePartner logo collagePartner logo collagePartner logo collagePartner logo collage
Latest Llama updates
llama-4-multimodal-intelligence graphic

Everything we announced at our first-ever LlamaCon

Learn more
llama-4-multimodal-intelligence graphic

The Llama 4 Herd: The Beginning of A New Era of Natively Multimodal AI Innovation

Learn more
5 Steps to Getting Started with Llama 2 graphic

Making talent scouting faster and easier with Llama

Learn more
Horizon banner image

Stay up-to-date

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with the latest Llama updates, releases and more.

Sign up