Build on your own terms
MODELS
Latest Llama models
The latest models feature native multimodality, advanced reasoning, and industry-leading context windows.Model overview
Llama 4
Native multimodality leveraging early fusion to pre-train unlabeled text and vision data enabling a change in intelligence from separate, frozen multimodal weights.Llama 4 MaverickNatively multimodal for image and text understanding.
- 10M-token context for long-form work
- Multimodal text + image understanding
- For use cases around memory, personalization, and multi-modal applications
Llama 4 ScoutNatively multimodal offering text and visual intelligence
- Offers single H100 GPU efficiency
- 10M context window
- For use cases around long document analysis
Llama 3
The open-source AI models you can fine-tune, distill and deploy anywhere. Choose from our collection of models: Llama 3.1, Llama 3.2, Llama 3.3.Llama 3.3Multilingual open source large language model.
- Available in 70B
- Experience 405B performance and quality at a fraction of the cost
- Built for text-based use cases such as synthetic data generation
Llama 3.2Flexible, cost-effective, and built for edge use cases.
- 1B & 3B are lightweight and cost-efficient allowing you to run them anywhere
- 11B & 90B are flexible multimodal models that can reason on high resolution images and output text
Llama 3.1Open-foundation model built for flexibility and control.
- Available in 8B, 70B, and 405B sizes
- Capabilities in general knowledge, steerability, math, tool use, and multilingual translation
- Text summarization, multilingual agents, and coding use cases
Model optimization
Llama 4 capabilities
Llama 4 benchmark
Task
Metric
Llama 4 Maverick
Llama 4 Scout
Reasoning
MMLU Pro
80.5
74.3
GPQA Diamond
69.8
57.2
Coding
LiveCodeBench
43.4
32.8
Multimodal (Image)
MMMU
73.4
69.4
MathVista
73.7
70.7
ChartQA
90.0
88.8
DocVQA
94.4
94.4
Multilingual
MMLU Multi
84.6
74.3
Long Context
MTOB Half Book
54.0 / 46.4
42.2 / 36.6
MTOB Full Book
50.8 / 46.7
39.7 / 36.3
Efficiency
Cost per 1M tokens
$0.19–$0.49
$0.19–$0.49
Methodology & Notes1. For Llama model results, we report 0 shot evaluation with temperature = 0 and no majority voting or parallel test time compute. For high-variance benchmarks (GPQA Diamond, LiveCodeBench), we average over multiple generations to reduce uncertainty.2. Specialized long context evals are not traditionally reported for generalist models, so we share internal runs to showcase llama's frontier performance.3. $0.19/Mtok (3:1 blended) is our cost estimate for Llama 4 Maverick assuming distributed inference. On a single host, we project the model can be served at $0.30 - $0.49/Mtok (3:1 blended).
Start building
Featured case studies
CONSUMER
How Shopify is using Llama
Shopify uses Llama to generate product pages, localize content, and automate support, helping developers scale workflows and save time.+76%
higher token throughout than the previous model97.7%
accurate Macro-F1 score on intent detection33%
compute cost savings with JSON output
SAFETY


