Meta

Meta
FacebookXYouTubeLinkedIn
Documentation
OverviewModels Getting the Models Running Llama How-To Guides Integration Guides Community Support

Community
Community StoriesOpen Innovation AI Research CommunityLlama Impact Grants

Resources
CookbookCase studiesVideosAI at Meta BlogMeta NewsroomFAQPrivacy PolicyTermsCookies

Llama Protections
OverviewLlama Defenders ProgramDeveloper Use Guide

Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide

Table Of Contents

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources

Running Meta Llama on Mac

This tutorial is a part of our Build with Meta Llama series, where we demonstrate the capabilities and practical applications of Llama for developers like you, so that you can leverage the benefits that Llama has to offer and incorporate it into your own applications. This tutorial supports the video Running Llama on Mac | Build with Meta Llama, where we learn how to run Llama on Mac OS using Ollama, with a step-by-step tutorial to help you follow along.

If you're interested in learning by watching or listening, check out our video on Running Llama on Mac.

Setup

For this demo, we are using a Macbook Pro running Sonoma 14.4.1 with 64GB memory. Since we will be using Ollamap, this setup can also be used on other operating systems that are supported such as Linux or Windows using similar steps as the ones shown here.

Ollama lets you set up and run Large Language models like Llama models locally.
Ollama website
Downloading Ollama
The first step is to install Ollama. To do that, visit their website, where you can choose your platform, and click on “Download” to download Ollama. For our demo, we will choose macOS, and select “Download for macOS”.
Next, we will make sure that we can test run Meta Llama 3 models on Ollama. Please note that Ollama provides Meta Llama models in the 4-bit quantized format. To test run the model, let’s open our terminal, and run ollama pull llama3 to download the 4-bit quantized Meta Llama 3 8B chat model, with a size of about 4.7 GB.
Downloading 4-bit quantized Meta Llama models
Downloading 4-bit quantized Meta Llama models

If you’d like to download the Llama 3 70B chat model, also in 4-bit, you can instead type

ollama pull llama3:70b

which in quantized format, would have a size of about 39GB.

Running the model

Running using ollama run

To run our model, in your terminal, type:

ollama run llama3

We are all set to ask questions and chat with our Meta Llama 3 model. Let’s ask some questions:

“Who wrote the book godfather?"
Meta Llama model generating a response
Meta Llama model generating a response

We can see that it gives the right answer, along with more information about the book as well as the movie that was based on the book. What if I just wanted the name of the author, without the extra information. Let’s adapt our prompt accordingly, specifying the kind of response we expect:

"Who wrote the book godfather? Answer with only the name."

Meta Llama model generating a specified responses based on prompt
Meta Llama model generating a specified responses based on prompt

We can see that it generates the answer in the format we requested.

You can also try running the 70B model:

ollama run llama3:70b

but the inference speed will likely be slower.

Running with curl

You can even run and test the Llama 3 8B model directly by using the curl command and specifying your prompt right in the command:
curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {
      "role": "user",
      "content": "who wrote the book godfather?"
    }
  ],
  "stream": false
}'
Here, we are sending a POST request to an API running on localhost. The API endpoint is for "chat", which will interact with our AI model hosted on the server. We are providing a JSON payload that contains a string specifying the name of the AI model to use for processing the input prompt (llama3), an array with a string indicating the role of the message sender (user) and a string with the user's input prompt ("who wrote the book godfather?"), and a boolean value stream indicating whether the response should be streamed or not. In our case, it is set to false, meaning the entire response will be returned at once.
Ollama running Llama model with curl command
Ollama running Llama model with curl command

As we can see, the model generated the response with the answer to our question.

Running as a Python script

This example can also be run using a Python script. To install Python, visit the Python website, where you can choose your OS and download the version of Python you like.
To run it using a Python script, open the editor of your choice, and create a new file. First, let’s add the imports we will need for this demo, and define a parameter called url, which will have the same value as the URL we saw in the curl demo:
import requests
import json

url = "http://localhost:11434/api/chat"
We will now add a new function called llama3, which will take in prompt as an argument:
def llama3(prompt):
    data = {
        "model": "llama3",
        "messages": [
            {
                "role": "user",
                "content": prompt

            }
        ],
        "stream": False,
    }

    headers = {
        "Content-Type": "application/json"
    }

    response = requests.post(url, headers=headers, json=data)
    return response.json()["message"]["content"]
This function constructs a JSON payload containing the specified prompt and the model name, which is "llama3”. Then, it sends a POST request to the API endpoint with the JSON payload as the message body, using the requests library. Once the response is received, the function extracts the content of the response message from the JSON object returned by the API, and returns this extracted content.

Finally, we will provide the prompt and print the generated response:

response = llama3("who wrote the book godfather")
print(response)
To run the script, write python <name of script>.py and press enter.
Running Meta Llama model using Ollama and Python script
Running Meta Llama model using Ollama and Python script
As we can see, it generated the response based on the prompt we provided in our script. To learn more about the complete Ollama APIs, check out their documentation.
To check out the full example, and run it on your own machine, our team has developed a detailed sample notebook that you can refer to and can be found in the llama-cookbook Github repo, where you will find an example of how to run Llama 3 models on a Mac as well as other platforms. You will find the examples we discussed here, as well as other ways to use Llama 3 locally with Ollama via LangChain.
We’ve also created various other demos and examples to provide you with guidance and as references to help you get started with Llama models and to make it easier for you to integrate Llama into your own use cases. These demos and examples are also located in our llama-cookbook GitHub repo and on PyPI, where you’ll find complete walkthroughs for how to get started with Llama models, including several examples for inference, fine tuning, and training on custom data sets—as well as demos that showcase Llama deployments, basic interactions, and specialized use cases.
Was this page helpful?
Yes
No
On this page
Running Meta Llama on Mac
Setup
Running the model
Running using ollama run
Running with curl
Running as a Python script
Skip to main content
Meta
Models & Products
Docs
Community
Resources
Llama API
Download models