Meta

Meta
FacebookXYouTubeLinkedIn
Documentation
OverviewModels Getting the Models Running Llama How-To Guides Integration Guides Community Support

Community
Community StoriesOpen Innovation AI Research CommunityLlama Impact Grants

Resources
CookbookCase studiesVideosAI at Meta BlogMeta NewsroomFAQPrivacy PolicyTermsCookie Policy

Llama Protections
OverviewLlama Defenders ProgramDeveloper Use Guide

Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookie Policy
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookie Policy
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookie Policy
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Meta
Models & Products
Docs
Community
Resources
Llama API
Download models
Horizon banner image

Llama Stack

Llama Stack

A streamlined developer experience enabling seamless AI application development.
Build Once, Deploy Anywhere
Cookbooks
Docs

What is Llama Stack?

Llama Stack is a comprehensive system that provides a uniform set of tools for building, scaling, and deploying generative AI applications, enabling developers to create, integrate, and orchestrate multiple AI services and capabilities into an adaptable setup.
Llama Stack codifies best practices across the Llama ecosystem. With the release of the Llama 4 herd of models, you can use Llama Stack to build innovative, personalized applications with the most accessible and scalable generation of Llama.
Learn more
OverviewDocsExperienceLlama ProtectionsResources

Overview

Llama Stack Provides

Flexible Deployment Options
Consistent Application Behavior
Robust Distribution Partner Network
placeholder-image

Standardized APIs

Provides consistent interfaces for building and deploying AI applications.
placeholder-image

Flexible Deployment Options

Supports local development, cloud, on-premises, and mobile environments.
placeholder-image

Pre-built Tools

Offers tools for inference, safety monitoring, and memory management.
placeholder-image

Scalable Infrastructure

Facilitates easy scaling of AI applications.
placeholder-image

Strong Partner Network

Collaborates with various providers to offer specialized services.
placeholder-image

Telemetry and Monitoring

Includes built-in support for tracing requests, evaluating model outputs.

Capabilities

Llama Stack SDKs have differentiating features and capabilities that empower developers to create AI applications with ease from RAG to innovative applications tailored for mobile use cases.
RAG
Multi-image inference
Custom tool calling

RAG
RAG
Easily add RAG to your apps with our mobile framework and customize tailored models with user-specific context. Available in local and remote inference modes. Check out more on GitHub.

Multi-image inference
Multi-image inference
Utilize multi-image inference to build experiences that process and analyze multiple images at once. Available in remote inference mode only. Learn more on GitHub.

Custom tool calling
Custom tool calling
Mobile framework supports unique tool calls like creating a calendar event all within your device’s app. Available in local and remote inference mode. Learn more on GitHub.
RAG
Easily add RAG to your apps with our mobile framework and customize tailored models with user-specific context. Available in local and remote inference modes. Check out more on GitHub.

Our approach

placeholder-image

Docs

Need help getting started building on Llama Stack? Check out our documentation.
Learn more
placeholder-image

Cookbooks

The tools and resources you need to build with Llama Stack. Access prepackaged, verified provider distributions and mobile SDKs with iOS and Android support.
Learn more

Llama Stack: a streamlined developer experience

Build faster, deploy anywhere and get the most out of the latest Llama models on day 1.
Learn more

For developers

Best practices included

Optimized support for agentic tool calling, safety guardrails, inference and much more, significantly lowering development costs.

Develop in your preferred language

Choose from python, node, kotlin, and swift programming languages to quickly build
your applications.
Choose from python, node, kotlin, and swift programming languages to quickly build your applications.

Develop & deploy anywhere

With a common API, choose any distribution and deploy on-prem, locally hosted, or even on-device at the edge.
placeholder-image

For partners & distributors

A standard API

Requires fewer model level changes across versions accelerating time to market for new models and lowering engineering investment.

Interoperability with the ecosystem

Leverage the fast moving Llama ecosystem by building on a common API and incorporate new components faster.

Support for agentic components

Llama Stack releases natively support tool calling, safety guardrails, retrieval augmented generation, an inference loop and other agentic functionality.
Llama Stack releases natively support tool calling, safety guardrails, retrieval augmented generation, an inference loop and other agentic functionality.

For developers

Best practices included

Optimized support for agentic tool calling, safety guardrails, inference and much more, significantly lowering development costs.

Develop in your preferred language

Choose from python, node, kotlin, and swift programming languages to quickly build your applications.

Develop & deploy anywhere

With a common API, choose any distribution and deploy on-prem, locally hosted, or even on-device at the edge.

For Partners & Distributors

A standard API

Requires fewer model level changes across versions accelerating time to market for new models and lowering engineering investment.

Interoperability with the ecosystem

Leverage the fast moving Llama ecosystem by building on a common API and incorporate new components faster.

Support for agentic components

Llama Stack releases natively support tool calling, safety guardrails, retrieval augmented generation, an inference loop and other agentic functionality.

Our partner ecosystem

Partner logo collage
Horizon banner image
llama protections

Llama Protections

Making safety tools accessible to everyone.
Enabling developers, advancing safety, and building an open ecosystem.
Learn more

Llama Stack resources

Open Approach graphic
Introduction to Llama Stack
Learn about Llama Stack from Dalton Flanagan Meta’s Software engineering working on AI.
Learn more
Open Approach graphic
Building Agents with Llama Stack
This technical tutorial will show you how to build a RAG agent using Llama Stack.
Learn more
Open Approach graphic
Llama Stack Github repo
Check out latest Llama Stack notebooks.
Learn more
Horizon banner image

Stay up-to-date

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with the latest Llama updates, releases and more.

Sign up