Meta

Meta
FacebookXYouTubeLinkedIn
Documentation
OverviewModels Getting the Models Running Llama How-To Guides Integration Guides Community Support

Community
Community StoriesOpen Innovation AI Research CommunityLlama Impact Grants

Resources
CookbookCase studiesVideosAI at Meta BlogMeta NewsroomFAQPrivacy PolicyTermsCookies

Llama Protections
OverviewLlama Defenders ProgramDeveloper Use Guide

Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide

Table Of Contents

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources
Getting the models

Deploying Llama 3.1 405B: Partner Guides

Llama 3.1 405B is Meta's most advanced and capable model to date. To help you unlock its full potential, please refer to the partner guides below.

Our partner guides offer tailored support and expertise to ensure a seamless deployment process, enabling you to harness the features and capabilities of Llama 3.1 405B. Browse the following partner guides to explore their specific offerings and take the first step towards successful deployment.

AWS

Amazon Web Services (AWS) gives you access to Meta's industry-leading Llama models—letting you build and scale sophisticated generative AI applications with ease.

Amazon Bedrock unlocks the full potential of Llama models through a secure, turnkey approach—freeing developers from infrastructure management and scalability concerns while providing enterprise-grade security via a unified API. Build and scale sophisticated generative AI applications instantly, without the traditional technical barriers. Read the documentation and get started in the Amazon Bedrock console.
Amazon SageMaker JumpStart empowers you to leverage pretrained Llama models and MLOps controls with Amazon SageMaker AI features, and deploy models under your VPC controls, all accessible via SageMaker Studio or through the SageMaker Python SDK.

Azure

Microsoft is bringing Llama 3.1 suite of models to Azure AI Model Catalog in AI Studio. This addition enhances the catalog with advanced synthetic data generation and knowledge distillation capabilities. The Azure AI Model Catalog is a platform that allows you to discover, evaluate, fine-tune and deploy a wide range of AI models for prototyping, optimizing and operationalizing your gen AI applications.
Developers using Llama 3.1 models can also work seamlessly with other tools in Azure AI Studio, such as Azure AI Content Safety, Azure AI Search, and prompt flow to enhance ethical and effective AI practices. Learn more here with this documentation.

Databricks

Databricks Mosaic AI has partnered with Meta to support the Llama 3.1 model architecture across the full suite of products:

  • The full suite of Llama 3.1 models (8B, 70B, and 405B) is available using Foundation Model APIs. See Get started querying LLMs on Databricks. Production usage at scale is supported via provisioned throughput. See Provisioned throughput Foundation Model APIs on how to bring Llama to production.
  • Full customization support is available through Mosaic AI Model Training. See the getting started guide to start customizing with data from Unity Catalog
Databricks supports the full GenAI app development cycle, including through AI functions for large-scale application and AI Agents Framework and Evaluation for building production-ready agentic and RAG apps.

Dell

Developers can access the Llama 3.1 405B models by downloading an optimized container from the Dell Enterprise Hub on Hugging Face, specifically designed for on-premises deployment on Dell PowerEdge XE9680 infrastructure.

Fireworks AI

Fireworks AI offers the fastest and most efficient generative AI inference engine enabling developers to build and deploy applications using Llama 3.1 7B, 70B, and 405B models with unmatched latency and total cost of ownership.
Powered by the custom runtime, FireAttention, it can delivers 4x faster latency and 15x greater throughput than self-hosted alternatives, substantially lowering operational costs with up to 15x reductions in accelerator expenses. Fireworks AI is the preferred partner for enterprises seeking operational stability, scalability, and high-quality AI application development.
The full suite of Llama models are available for inference and fine-tuning. To begin using the Llama 3.1 7B, 70B, and 405B models on Fireworks AI, you only need to sign up for an account and start making API calls using the provided API Key.

Google Cloud Platform

Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google. Building on top of GCP services, Model Garden on Vertex AI offers infrastructure to jumpstart your ML project, providing a single place to discover, customize, and deploy a wide range of models. You can also use Google Cloud Platform (GCP) to run Meta Llama models on your own managed infrastructure.

We have collaborated with Vertex AI from Google Cloud to offer Meta Llama models through easy-to-use interfaces. You can choose to use fully managed Llama APIs, or fine-tune and self-deploy Llama models.

Groq

Experience Groq, the fastest AI inference for Llama 3.1, including 405B, 70B & 8B Instruct models. GroqCloudâ„¢, powered by LPUâ„¢ AI Inference Technology, provides record-setting inference speed to unlock a new class of AI applications and use cases. Enterprises benefit from not only fast AI inference, affordability, and energy efficiency, but the knowledge that Groq will continue to outperform with its unique software-first approach.
For developers it’s an easy three step process to get started with Groq. Simply replace your existing industry standard API key with a free Groq API key, set the base URL, and start building with Llama 3.1 on the Groq Developer Console.

NVIDIA

Experience the NVIDIA-optimized Llama 3.1 405B, Llama 3.1 70B, and Llama 3.1 8B NIM API endpoints with free NVIDIA cloud credits from ai.nvidia.com. Instructions to download and run the NIMs on your local and cloud environments are provided under the on each model page.
Access Llama models through the NVIDIA NGC catalog, where you can search, select, and download various versions, including base, instruct, and older models.
  • Search and download the models either from your browser, via WGET, or through the NGC CLI.
  • The models are licensed under the Llama 3.1 Community License Agreement.
Build custom Llama NIMs for enterprise applications with NVIDIA AI Foundry using NVIDIA NeMo, an end-to-end platform offering data curation, customization, model evaluation, retrieval, and guardrails, on NVIDIA DGX Cloud

IBM watsonx

Developers can access Llama 3.1 405B models on IBM watsonx an advanced platform designed for AI builders, integrating generative AI capabilities, foundation models, and traditional machine learning. It provides a comprehensive suite of tools that span the AI lifecycle, enabling users to tune models with their enterprise data. For detailed instructions, refer to the getting started guide and quick start tutorials.

Scale AI

Scale offers Llama 3.1 on Scale GenAI Platform to help enterprises build, evaluate, and deploy custom, production-ready Generative AI solutions that drive real business outcomes. With Scale GenAI Platform enterprises can customize and evaluate Llama 3.1 for enterprise use cases including financial services, legal, edTech, and more. Read more about how to evaluate Llama 3.1 in our blog.
Llama 3.1 is also now available to customers in Scale GenAI Platform to build custom GenAI apps, fine-tune, implement in RAG workflows, and evaluate against other models. To learn more, schedule a demo with Scale today.

Snowflake

Snowflake provides the option to instantly access and customize Meta’s collection of LLMs with serverless inference and fine-tuning using Snowflake Cortex AI as well as the option to run custom deployments using Snowpark Container Services. With LLMs running inside Snowflake, teams can use the same role-based access controls to secure and govern both models and data, ensuring cohesive security and governance.
To get started in minutes building enterprise-grade generative AI applications using state-of-the-art models in Snowflake, check out the get started guide for Llama in Cortex AI.

Together AI

Together AI offers the fastest fully-comprehensive developer platform for Llama models: with easy-to-use OpenAI-compatible APIs for Llama 3.1 and 3.2 models, as well as support for Llama Stack. The text-only models, which include 3B, 8B, 70B, and 405B, are optimized for natural language processing, offering solutions for various applications. The 3.2 vision+text models, such as the free 11B and 90B models, are specifically designed for multimodal tasks that combine text and image understanding. Together AI has made the Llama 3.2 11B model available for *free* development, experimentation, and personal applications.

Together Turbo and Together Lite endpoints for Llama models enable performance, quality, and price flexibility so enterprises do not have to compromise. Together AI applies cutting-edge research and innovations such as FlashAttention-3, enabling Together Inference Engine 2.0 to deliver up to 15x throughput improvement, while lowering operational costs.

Enterprises benefit from flexible deployment options, including Serverless or Dedicated Endpoints on Together Cloud or their own VPC environments using AWS or other cloud providers. Organizations retain full ownership of their fine-tuned Llama models and customer data, ensuring security and control at every stage of AI development.
Was this page helpful?
Yes
No
On this page
Deploying Llama 3.1 405B: Partner Guides
AWS
Azure
Databricks
Dell
Fireworks AI
Google Cloud Platform
Groq
NVIDIA
IBM watsonx
Scale AI
Snowflake
Together AI
Skip to main content
Meta
Models & Products
Docs
Community
Resources
Llama API
Download models