Llama banner image

Llama 4: Leading intelligence.

Unrivaled speed and efficiency.

The most intelligent, scalable, and convenient generation of Llama is here: natively multimodal, mixture-of-experts models, advanced reasoning, and industry-leading context windows. Build your greatest ideas and seamlessly deploy in minutes with Llama API and Llama Stack.
Horizon banner image

Catch up on Meta Connect 2025

Build with Llama 4

We've optimized models for easy deployment, cost efficiency, and performance that scales to billions of users. We can’t wait to see what you build.
Llama API
Go from ideation to app deployment in minutes. Experience a seamless and efficient way to build AI apps using Llama models. Learn more.
Llama 4 Scout
Class-leading natively multimodal model that offers superior text and visual intelligence, single H100 GPU efficiency, and a 10M context window for seamless long document analysis.
Llama 4 Maverick
Industry-leading natively multimodal model for image and text understanding with groundbreaking intelligence and fast responses at a low cost.
Llama 4 Behemoth Preview
An early preview (it’s still training!) of the Llama 4 teacher model used to distill Llama 4 Scout and Llama 4 Maverick. Learn more about it.

Llama 4 Capabilities

Llama 4 Behemoth, Llama 4 Scout and Llama 4 Maverick offer class-leading capabilities.
Natively Multimodal
Unparalleled Long Context
Expert Image Grounding
Natively Multimodal
Natively Multimodal
All Llama 4 models are designed with native multimodality, leveraging early fusion that allows us to pre-train the model with large amounts of unlabeled text and vision tokens - a step change in intelligence from separate, frozen multimodal weights.
Unparalleled Long Context
Unparalleled Long Context
Llama 4 Scout supports up to 10M tokens of context - the longest context length available in the industry - unlocking new use cases around memory, personalization, and multi-modal applications.
Expert Image Grounding
Expert Image Grounding
Llama 4 is also best-in-class on image grounding, able to align user prompts with relevant visual concepts and anchor model responses to regions in the image.
Natively Multimodal
All Llama 4 models are designed with native multimodality, leveraging early fusion that allows us to pre-train the model with large amounts of unlabeled text and vision tokens - a step change in intelligence from separate, frozen multimodal weights.

Benchmarks

We evaluated model performance on a suite of common benchmarks across a wide range of languages, testing for coding, reasoning, knowledge, vision understanding, multilinguality, and long context.
Category
Benchmark

Inference Cost

Cost per 1M input & output tokens (3:1 blended)

Image Reasoning

MMMU
MathVista

Image Understanding

ChartQA
DocVQA
(test)

Coding

LiveCodeBench
(10/01/2024-02/01/2025)

Reasoning & Knowledge

MMLU Pro
GPQA Diamond

Multilingual

Multilingual MMLU

Long context

MTOB (half book)
eng->kgv/kgv->eng
MTOB (full book)
eng->kgv/kgv->eng

Llama 4 Maverick

$0.19-$0.49⁵

73.4

73.7

90.0

94.4

43.4

80.5

69.8

84.6

54.0/46.4

50.8/46.7

Gemini 2.0 Flash

$0.17

71.7

73.1

88.3

-

34.5

77.6

60.1

-

48.4/39.80⁴

45.5/39.6⁴

DeepSeek v3.1

$0.48

No multimodal support

45.8/49.2³

81.2

68.4

-

Context window is 128K

GPT-4o

$4.38

69.1

63.8

85.7

92.8

32.3³

-

53.6

81.5

Context window is 128K

  1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.

  2. For non-Llama models, we source the highest available self-reported eval results, unless otherwise specified. We only include evals from models that have reproducible evals (via API or open weights), and we only include non-thinking models. Cost estimates are sourced from Artificial Analysis for non-llama models.

  3. DeepSeek v3.1's date range is unknown (49.2), so we provide our internal result (45.8) on the defined date range. Results for GPT-4o are sourced from the LCB leaderboard.

  4. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.

  5. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended)

Category
Benchmark

Inference Cost

Cost per 1M input & output tokens (3:1 blended)

Image Reasoning

MMMU
MathVista

Image Understanding

ChartQA
DocVQA
(test)

Coding

LiveCodeBench
(10/01/2024-02/01/2025)

Reasoning & Knowledge

MMLU Pro
GPQA Diamond

Multilingual

Multilingual MMLU

Long context

MTOB (half book)
eng->kgv/kgv->eng
MTOB (full book)
eng->kgv/kgv->eng

Llama 4 Maverick

$0.19-$0.49⁵

73.4

73.7

90.0

94.4

43.4

80.5

69.8

84.6

54.0/46.4

50.8/46.7

Gemini 2.0 Flash

$0.17

71.7

73.1

88.3

-

34.5

77.6

60.1

-

48.4/39.80⁴

45.5/39.6⁴

DeepSeek v3.1

$0.48

No multimodal support

45.8/49.2³

81.2

68.4

-

Context window is 128K

GPT-4o

$4.38

69.1

63.8

85.7

92.8

32.3³

-

53.6

81.5

Context window is 128K

  1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.

  2. For non-Llama models, we source the highest available self-reported eval results, unless otherwise specified. We only include evals from models that have reproducible evals (via API or open weights), and we only include non-thinking models. Cost estimates are sourced from Artificial Analysis for non-llama models.

  3. DeepSeek v3.1's date range is unknown (49.2), so we provide our internal result (45.8) on the defined date range. Results for GPT-4o are sourced from the LCB leaderboard.

  4. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.

  5. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended)

Resources
Explore the latest tools, documentation, and best practices as you build with Llama.
placeholder-image

Docs

The guides and resources you need to build with Llama 4.
placeholder-image

Cookbooks

Check out our collection of Llama recipes to help you get started faster.
placeholder-image

Case studies

See how other innovators are building with Llama.

Our partner ecosystem

Llama partners collage
Latest Llama updates
llama design drive graphic
Open Source
Joining forces with AWS on a new program to help startups build with Llama
We're joining forces with Amazon Web Services to announce a new program that will provide resources and support to 30 promising startups in the U.S. that are building with Llama.
Learn more
llama design drive graphic
Large Language Model
How Llama helps drive engineering efficiency at a major Australian bank
ANZ, one of Australia's Big Four banks, is driving engineering efficiency with Llama.
Learn more
llama design drive graphic
Large Language Model
Announcing the inaugural Llama Startup Program cohort
At Meta, we believe in the potential of early-stage startups to drive innovation in the generative AI market, and through the Llama Startup Program, we aim to lower the barrier to entry for getting started with Llama models.
Learn more
llama design drive graphic
Open Source
Introducing the Llama Startup Program
We’re excited to announce the Llama Startup Program, a new initiative to empower early stage startups to innovate and build generative AI applications with Llama.
Learn more
Horizon banner image

Stay up-to-date

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with the latest Llama updates, releases and more.