Meta

Meta
FacebookXYouTubeLinkedIn
Documentation
OverviewModels Getting the Models Running Llama How-To Guides Integration Guides Community Support

Community
Community StoriesOpen Innovation AI Research CommunityLlama Impact Grants

Resources
CookbookCase studiesVideosAI at Meta BlogMeta NewsroomFAQPrivacy PolicyTermsCookies

Llama Protections
OverviewLlama Defenders ProgramDeveloper Use Guide

Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Skip to main content
Meta
Models & Products
Docs
Community
Resources
Llama API
Download models

Table Of Contents

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources

Meta Llama in the Cloud

This tutorial is a part of our Build with Meta Llama series, where we demonstrate the capabilities and practical applications of Llama for developers like you, so that you can leverage the benefits that Llama has to offer and incorporate it into your own applications. This tutorial supports the video Many other ways to run Llama and resources | Build with Meta Llama, where we learn about some of the various other ways in which you can host or run Meta Llama models, and provide you with all the resources that can help you get started.

If you're interested in learning by watching or listening, check out our video on Many other ways to run Llama and resources.

Apart from running the models locally, one of the most common ways to run Meta Llama models is to run them in the cloud. We saw an example of this using a service called Hugging Face in our running Llama on Windows video. Let's take a look at some of the other services we can use to host and run Llama models such as AWS, Azure, Google, Kaggle, and VertexAI—among others.

Amazon Web Services

Amazon Web Services (AWS) provides multiple ways to host your Llama models such as SageMaker Jumpstart and Bedrock.

Bedrock is a fully managed service that lets you quickly and easily build generative AI-powered experiences. To use Meta Llama with Bedrock, check out their website that goes over how to integrate and use Meta Llama models in your applications.
You can also use AWS through SageMaker JumpStart, which enables you to build, train, and deploy ML models from a broad selection of publicly available foundation models, and deploy them on SageMaker Instances for model training and inference. Learn more about how to use Meta Llama on Sagemaker on their website.

Microsoft Azure

Another way to run Meta Llama models is on Microsoft Azure. You can access Meta Llama models on Azure in two ways:

  • Models as a Service (MaaS) provides access to Meta Llama hosted APIs through Azure AI Studio
  • Model as a Platform (MaaP) provides access to Meta Llama family of models with out of the box support for fine-tuning and evaluation though Azure Machine Learning Studio.

Please refer to our How to Guide for more details.

Google Cloud Platform

You can also use GCP, or Google Cloud Platform, to run Meta Llama models. GCP is a suite of cloud computing services that provides computing resources as well as virtual machines. Building on top of GCP services, Model Garden on Vertex AI offers infrastructure to jumpstart your ML project with a single place to discover, customize, and deploy a wide range of models.
We have collaborated with Vertex AI from Google Cloud to fully integrate Meta Llama, offering pre-trained, instruction-tuned, and Meta CodeLlama, in various sizes. Check out how to fine-tune & deploy Meta Llama models on Vertex AI by visiting the website. Please note that you may need to request proper GPU computing quota as a prerequisite.

NVIDIA NIM

NVIDIA NIM inference microservice streamlines the deployment of Meta Llama models anywhere, including cloud, data center, and workstations. Instructions to download and run the NVIDIA-optimized models on your local and cloud environments are provided under the Docker tab on each model page in the NVIDIA API catalog, which includes Llama 3 70B Instruct and Llama 3 8B Instruct.
Additionally, you can deploy the Meta Llama models directly from Hugging Face on top of cloud platforms with just a few clicks.
You can also try the performance-optimized NVIDIA NIM, which uses industry standard APIs, for Llama 3 models from ai.nvidia.com.

Databricks Mosaic AI

Databricks Mosaic AI has partnered with Meta to support Llama models across the full suite of AI products.

  • Llama models are available in Databricks Foundation Model API, which enables easy experimentation with easy ways to scale to production with enterprise-grade security and scalability. See Get started querying LLMs on Databricks.
  • Customization support is available for models through Mosaic AI Model Training. See the getting started guide to start customizing with data from Unity Catalog.
  • Databricks supports the full GenAI app development cycle, including through AI functions for large-scale batch processing and AI Agents Framework and Evaluation for building production-ready agentic and RAG apps.

Snowflake Cortex AI

Snowflake Cortex AI is a suite of integrated features and services that provides fully-managed LLM inference, fine-tuning, and RAG for both structured and unstructured data analysis. The platform enables quick integration of industry-leading models, both open source and proprietary, through LLM functions or REST APIs, while maintaining enterprise-grade security and governance, all within Snowflake’s secure perimeter.
For AI engineers, Cortex AI offers instant access to Meta’s collection of LLMs with serverless inference and fine-tuning capabilities. Choose from various model sizes and language support, or run custom deployments via Snowflake Container Services. Snowflake is innovating with Meta’s Llama models through initiatives coming from Snowflake AI research team like SwiftKV, which reduces inference costs by up to 75% while maintaining model accuracy through rewiring and fine-tuning enabling customers to build more cost-effective and high-performing AI solutions through Snowflake Cortex AI.
Data engineers can run LLMs directly inside Snowflake without data movement, using existing role-based access controls to secure both models and data. This native integration enables seamless analysis of unstructured data alongside structured data, making it simple to build comprehensive AI applications or easily apply custom or out-of-the-box task functions powered by Llama while maintaining consistent governance standards.

IBM watsonx

You can also use IBM's watsonx to run Meta Llama models. IBM watsonx is an advanced platform designed for AI builders, integrating generative AI capabilities, foundation models, and traditional machine learning. It provides a comprehensive suite of tools that span the AI lifecycle, enabling users to tune models with their enterprise data. The platform supports multi-model flexibility, client protection, AI governance, and hybrid, multi-cloud deployments. It offers features for extracting insights, discovering trends, generating synthetic tabular data, running jupyter notebooks, and creating new content and code. Watsonx.ai equips data scientists with the necessary tools, pipelines, and runtimes for building and deploying ML models, thereby automating the entire AI model lifecycle.
We've worked with IBM to make Llama and Code Llama models available on their platform. To test the platform and evaluate Llama on watsonx, creating an account is free and allows testing the available models through the Prompt Lab. For detailed instructions, refer to the getting started guide and the quick start tutorials.

Other hosting providers

You can also run Llama models using hosting providers such as Together AI, Anyscale, Replicate, Groq, Fireworks AI, Cloudflare, etc. Our team has worked on step by step examples to showcase how to run Llama on externally hosted providers. The examples can be found on our Llama-cookbook GitHub repo, which goes over the process of setting up and running inference for Llama models on some of these externally hosted providers.

Running Llama on premise

Many enterprise customers prefer to deploy Llama models on-premise and on their own servers. One way to deploy and run Llama models in this manner is by using TorchServe. TorchServe is an easy to use tool for deploying PyTorch models at scale. It is cloud and environment agnostic and supports features such as multi-model serving, logging, metrics and the creation of RESTful endpoints for application integration. To learn more about how TorchServe works, with setup, quickstart, and examples check out the Github repo.
Another way to deploy llama models on premise is by using Virtual Large Language Model (vLLM) or Text Generation Inference (TGI), two leading open-source tools to deploy and serve LLMs. A detailed step by step tutorial can be found on our llama-cookbook Github repo that showcases how to use Llama models with vLLM and Hugging Face TGI, and how to create vLLM and TGI hosted Llama instances with LangChain—a language model integration framework for the creation of applications using large language models.

Resources

You can find various demos and examples that can provide you with guidance—and that you can use as references to get started with Llama models—on our Llama-cookbook GitHub repo, where you’ll find several examples for inference and fine tuning, as well as running on various API providers.
Learn more about Llama 3 and how to get started by checking out our Getting to know Llama notebook that you can find in our llama-cookbook Github repo. Here you will find a guided tour of Llama 3, including a comparison to Llama 2, descriptions of different Llama 3 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), fine-tuning, and more. You will find all this implemented with starter code that you can take and adapt to use in your own Meta Llama 3 projects.
To learn more about our Llama 3 models, check out our announcement blog where you can find details about how the models work, data on performance and benchmarks, information about trust and safety, and various other resources to get you started.
Get the model source from our Llama 3 Github repo, where you can learn how the models work along with a minimalist example of how to load Llama 3 models and run inference. Here, you will also find steps to download and set up the models, and examples for running the text completion and chat models.
Meta Llama3 GitHub repo
Meta Llama3 GitHub repo
Dive deeper and learn more about the model in the model card, which goes over the model architecture, intended use, hardware and software requirements, training data, results, and licenses.
Check out our new Meta AI, built with Llama 3 technology, which is now one of the world’s leading AI assistants that can boost your intelligence and lighten your load, helping you learn, get things done, create content, and connect to make the most out of every moment.
Meta AI
Meta AI

You can use Meta AI on Facebook, Instagram, WhatsApp, Messenger, and the web to get things done, learn, create, and connect with the things that matter to you.

To learn more about the latest updates and releases of Llama models, check out our website, where you can learn more about the latest models as well as find resources to learn more about how these models work and how you can use them in your own applications.
Check out our Getting Started guide that provides information and resources to help you set up Llama including how to access the models, prompt formats, hosting, how-to and integration guides, as well as resources that you can reference to get started with your projects.
Take a look at some of our latest blogs that discuss new announcements, the latest on the Llama ecosystem, and our responsible approach to Meta AI and Meta Llama 3.
Check out the community resources on our website to help you get started with Meta Llama models, learn about performance & latency, fine tuning, and more.
Dive deeper into prompt engineering, learning best practices for prompting Meta Llama models and interacting with Meta Llama Chat, Code Llama, and Llama Guard models in our short course on Prompt Engineering with Llama 2 on DeepLearing.ai, recently updated to showcase both Llama 2 and Llama 3 models.
Check out our Community Stories that go over interesting use cases of Llama models in various fields such as in Business, Healthcare, Gaming, Pharmaceutical, and more!
Learn more about the Llama ecosystem, building product experiences with Llama, and examples that showcase how industry pioneers have adopted Llama to build and grow innovative products for users across their platforms at Connect 2023.
Also check out our Responsible Use Guide that provides developers with recommended best practices and considerations for safely building products powered by LLMs.

We hope you found the Build with Meta Llama videos and tutorials helpful to provide you with insights and resources that you may need to get started with using Llama models.

We at Meta strongly believe in an open approach to AI development, democratizing access through an open platform and providing you with AI models, tools, and resources to give you the power to shape the next wave of innovation. We want to kickstart that next wave of innovation across the stack—from applications to developer tools to evals to inference optimizations and more. We can’t wait to see what you build and look forward to your feedback.

Was this page helpful?
Yes
No
On this page
Meta Llama in the Cloud
Amazon Web Services
Microsoft Azure
Google Cloud Platform
NVIDIA NIM
Databricks Mosaic AI
Snowflake Cortex AI
IBM watsonx
Other hosting providers
Running Llama on premise
Resources