Meta

Meta
FacebookXYouTubeLinkedIn
Documentation
OverviewModels Getting the Models Running Llama How-To Guides Integration Guides Community Support

Community
Community StoriesOpen Innovation AI Research CommunityLlama Impact Grants

Resources
CookbookCase studiesVideosAI at Meta BlogMeta NewsroomFAQPrivacy PolicyTermsCookie Policy

Llama Protections
OverviewLlama Defenders ProgramDeveloper Use Guide

Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookie Policy
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookie Policy
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookie Policy
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Meta
Models & Products
Docs
Community
Resources
Llama API
Download models

Table Of Contents

Overview
Models
Llama 4
Llama Guard 4
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
Deployment (New)
Private cloud deployment
Production deployment pipelines
Infrastructure migration
Versioning
Accelerator management
Autoscaling
Regulated industry self-hosting
Security in production
Cost projection and optimization
Comparing costs
A/B testing
How-To Guides
Prompt Engineering (Updated)
Fine-tuning (Updated)
Quantization (Updated)
Distillation (New)
Evaluations (New)
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources

Overview
Models
Llama 4
Llama Guard 4
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
Deployment (New)
Private cloud deployment
Production deployment pipelines
Infrastructure migration
Versioning
Accelerator management
Autoscaling
Regulated industry self-hosting
Security in production
Cost projection and optimization
Comparing costs
A/B testing
How-To Guides
Prompt Engineering (Updated)
Fine-tuning (Updated)
Quantization (Updated)
Distillation (New)
Evaluations (New)
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources
Deployment guides

Security in production

Introduction

Securing a production Llama deployment introduces unique challenges that go beyond traditional application security. While standard practices provide a foundation, the dynamic nature of LLMs—their ability to generate content and interpret complex inputs—creates new attack surfaces, such as prompt injection, that developers need to mitigate. A breach can lead to sensitive data exposure, unauthorized model manipulation, and significant reputational damage.

This guide provides a blueprint for a resilient, multi-layered security posture for your Llama applications. We will walk through a defense-in-depth strategy, covering four critical layers:

  • Infrastructure Security: Hardening the network and compute foundation.
  • Data Security: Protecting data throughout its entire lifecycle.
  • Application Security: Defending against prompt injection and other LLM-specific threats.
  • Operational Security: Maintaining vigilance with continuous monitoring and response.

By following this guide, you will learn to implement a complete security framework, mitigate risks from the OWASP Top 10 for LLMs, and make informed decisions to protect your AI systems in regulated and enterprise environments.

Scope and assumptions

Focus on production deployments

This guide focuses on the security practices required for deploying, operating, and maintaining Llama models in production environments. While many principles apply to development and staging, the emphasis is on protecting live systems and sensitive data.

Model-agnostic principles

While this guide is written for developers building with Llama, the underlying principles are model-agnostic. The architectural patterns here represent a universal blueprint for securing any enterprise-grade LLM, whether self-hosted or in a private cloud.

Focus on private network deployments

This guide details a private-network-first security architecture. This approach, which isolates all components from the public internet by default, represents a security best practice for enterprise and regulated environments. For guidance on securely exposing an application to the public internet, see Considerations for Public-Facing Applications.

Important concepts

This section breaks down the core principles that form our defense strategy.

Zero Trust

This model operates on a simple premise: never trust, always verify. It discards the outdated idea of a "trusted" internal network and assumes threats can originate from anywhere. This includes internal actors, compromised pipelines, and misconfigured services. In practice, this means every user and service must be strictly authenticated and authorized for every single action—a critical defense against attacks that leverage compromised internal credentials.

Least privilege

This principle mandates that any component—user, service, or process—should only have the absolute minimum permissions required to do its job. For example, a service that reads data should not have permission to delete it. By strictly limiting capabilities, you dramatically reduce the potential damage from a compromised component, containing a breach before it can escalate.

Defense-in-depth

This strategy layers multiple, independent security controls to protect your assets. The core idea is that if one layer fails (e.g., a firewall is misconfigured), other layers are already in place to stop the attack. This guide is built on this concept, creating a resilient system where a single point of failure cannot lead to a full compromise.

LLM Data and IP Leakage Vectors

Understanding the primary data leakage vectors is the first step in building a secure system. An insecure LLM application deployment can expose not only user data like Personally Identifiable Information (PII) but also business-critical Intellectual Property (IP) from fine-tuning datasets or Retrieval-Augmented Generation (RAG) knowledge bases.

  • Training Data Memorization: If a model inadvertently memorizes sensitive information from its training data, an attacker can use carefully crafted prompts to extract that information, such as proprietary algorithms or personal data.
  • Prompt Injection: If an application does not properly separate trusted instructions from untrusted user input, an attacker can override the model's original instructions with malicious ones, tricking it into revealing sensitive data from its context window, including information from other users or the underlying system prompt.
  • RAG System Exploitation: If a RAG system lacks strict access controls, an attacker can use the LLM as a tool to query and exfiltrate sensitive information from the entire document database it has access to.
  • Insecure Logging: If prompts and responses are stored in logs without proper redaction or security, they create a rich target. An attacker who gains access to the logging infrastructure can then exfiltrate any sensitive data contained within those logs.
  • Model Theft: If model weights are not properly secured, an attacker can exfiltrate them, resulting in a direct loss of valuable intellectual property, especially for a fine-tuned model.

Infrastructure security: Securing the perimeter

A secure application cannot be built on a vulnerable foundation. While not unique to LLM applications, this first layer of defense establishes a hardened perimeter that prevents unauthorized network access and isolates your Llama deployment from both external and internal threats.

Network isolation and segmentation

The primary goal is to ensure your Llama models and the data they process are never directly exposed, either to the public internet or to unauthorized internal services. This is a foundational control to prevent unauthorized access and mitigate the business risk of a costly data breach. You achieve this by deploying all components within a private, isolated network.

  • Private Networks: Deploy all resources within a Virtual Private Cloud (VPC) on cloud platforms or a dedicated physical network for on-premises deployments. Core components like inference servers and databases should never have public IP addresses. All access must be brokered through a secure, controlled ingress point.
  • Microsegmentation: Divide your network into smaller, isolated segments. For example, place your inference servers in a dedicated "inference subnet" and your application front-end in a separate "application subnet." Use network access control lists (NACLs) and security groups to strictly control traffic between these subnets.
# Conceptual network segmentation policy
network_policies:
  - name: allow-app-to-inference
    from:
      - subnet: app_subnet # 10.0.1.0/24
    to:
      - subnet: inference_subnet # 10.0.2.0/24
    ports:
      - protocol: TCP
        port: 8000
  - name: deny-inference-egress
    from:
      - subnet: inference_subnet
    to:
      - cidr: 0.0.0.0/0
    # Default-deny is a security best practice. It prevents data exfiltration
    # and limits an attacker's ability to download tools or communicate out.
    action: deny

Secure ingress and egress control

You must control every packet that enters and leaves your network segments.

  • Ingress Control: All incoming requests should pass through a single, controlled entry point, such as an internal Application Load Balancer (ALB) or an API Gateway, accessible only from trusted internal networks.
  • Egress Control: By default, deny all outbound internet traffic from your inference subnet. If components require external access (e.g., for security patches), route this traffic through a NAT (Network Address Translation) gateway in a separate management subnet and apply highly restrictive firewall rules to isolate egress traffic .

Hardening compute environments

Network security alone is insufficient; the hosts running your containers and applications must also be secured.

  • Select Minimal Base Images: Choose a base image that contains only the essential system libraries and binaries required to run your application. A minimal image should be sourced from a trusted publisher and have a strong record of timely security updates. This practice reduces the attack surface by minimizing exposure to unpatched vulnerabilities, which lowers the risk of a breach and reduces the operational overhead of security patching.
  • Vulnerability Scanning: Integrate automated security scanning into your CI/CD pipeline to check container images and their dependencies for known vulnerabilities (Common Vulnerabilities and Exposures, or CVEs) before deployment.
  • Run as Non-Root: Configure your containers to run as a non-root user. This practice significantly limits an attacker's ability to escalate privileges, containing a potential breach to a single container and preventing a wider system compromise that could take the application offline.

Infrastructure as Code (IaC) security

Manage your infrastructure declaratively using tools like Terraform to ensure consistent and auditable deployments.

  • Policy as Code: Integrate static analysis security testing (SAST) tools into your CI/CD pipeline. These tools can automatically scan your IaC templates for misconfigurations, such as publicly exposed storage buckets, and block insecure changes.
  • Secure State Management: The state file generated by IaC tools contains sensitive details about your infrastructure. Store it in a secure, encrypted remote backend and strictly control access to it.

With the infrastructure perimeter secured, the next layer of defense focuses on the data itself as it moves through your system.

Data security: Protecting the lifecycle

Data is the most valuable asset in your Llama deployment. This layer of defense focuses on protecting data throughout its entire lifecycle—at rest, in transit, and during processing.

Encryption at rest

All data stored on disk must be encrypted to prevent unauthorized access.

  • Use Organization-Managed Encryption Keys: Instead of relying on provider-managed defaults, use a dedicated Key Management Service (KMS) to create and manage your organization's own encryption keys. This model, often called Customer-Managed Keys (CMKs), is available in cloud services (e.g., AWS KMS, Azure Key Vault) and for on-premises deployments (e.g., HashiCorp Vault). It gives your organization direct control over the key lifecycle, which is critical for compliance and provides an essential "kill switch" to make data inaccessible in an emergency.
  • Encrypt All Artifacts: Apply encryption using your organization-managed keys to all stored data, including model weights, vector databases, logs, and backups.

Encryption in transit

Data moving between components is a primary target for interception.

  • Enforce TLS 1.3+: Configure all endpoints to accept only TLS 1.3 or higher. This ensures the use of strong ciphers and perfect forward secrecy.
  • Implement Mutual TLS (mTLS): For service-to-service communication, use mTLS. This provides strong, two-way authentication, preventing man-in-the-middle attacks and ensuring that only authorized services can communicate. This is a critical defense against internal threats and lateral movement by an attacker.

Key management lifecycle

Effective security depends on properly managing your cryptographic keys.

  • Automated Key Rotation: Configure your KMS to automatically rotate your CMKs on a regular schedule (e.g., every 90 or 365 days). This limits the potential impact of a single compromised key.
  • Least-Privilege Key Policies: Apply granular, least-privilege access policies to each key. A service role for an inference server should only have kms:Decrypt permissions, not permissions to manage the key itself. This control ensures that a single compromised service cannot be used to decrypt data beyond its intended scope, effectively containing a breach.
  • Audit Key Usage: Enable and centralize detailed KMS audit logs. Integrating these logs with your Security Information and Event Management (SIEM) system is not just for alerting on suspicious activity; it provides an immutable record of key usage, which is critical for forensic investigations and proving compliance to auditors.

Sensitive data detection and masking

Preventing sensitive data from being inadvertently logged, cached, or exposed by the model is a critical control.

  • Implement a Data Loss Prevention (DLP) Layer: Before processing any user input, pass it through a DLP service that identifies and redacts sensitive information like Personally Identifiable Information (PII), Protected Health Information (PHI), or financial data.
  • Use Reversible Masking: When redaction is necessary, use a reversible masking or tokenization technique. This replaces sensitive data with a placeholder token, ensuring that raw sensitive data never appears in logs, error messages, or model caches.

Application security: Hardening the stack

Once data is protected at rest and in transit, the focus shifts to controlling who can access the application and how to defend against model-specific attacks.

To enforce these controls consistently, you should architect a centralized LLM Security Gateway. This is an API gateway or reverse proxy that sits in front of your Llama inference endpoint and acts as a single choke point for all requests and responses. All subsequent application security controls discussed in this section should be implemented as components of this gateway.

The gateway is responsible for:

  • Authentication and Authorization
  • Input Validation (DLP, prompt injection filtering)
  • Rate Limiting and Quotas
  • Output Filtering (DLP)
  • Audit Logging

Role-Based Access Control (RBAC) patterns

The first responsibility of the gateway is to verify a user or service's identity and permissions. Implement granular access control to enforce the principle of least privilege.

  • User and Service Roles: Define specific roles based on function, such as:
    • InferenceUser: Can invoke the model for predictions but cannot modify it.
    • ModelDeployer: Can deploy and configure models but cannot access production data.
    • SecurityAuditor: Has read-only access to audit logs and security configurations.
  • Apply RBAC to RAG Data Sources: For Retrieval-Augmented Generation systems, ensure that the access controls are propagated to the underlying knowledge base. The LLM should only be able to retrieve and process documents that the user making the request is authorized to access.
  • Just-in-Time (JIT) Access: For highly privileged operations, do not grant standing permissions. Instead, use a Privileged Identity Management (PIM) system that requires operators to request temporary, approved access.

Mitigating LLM-specific threats (OWASP Top 10)

After validating the user's identity, the gateway inspects the request payload for malicious content. LLMs introduce new vulnerabilities that require specific defenses. The OWASP Top 10 for LLM Applications provides a critical framework for these risks.

  • Preventing Prompt Injection (LLM01): This is the most critical LLM vulnerability, where an attacker embeds instructions within a user prompt to hijack the model's output. A successful attack can result in data theft, bypass of safety controls, and significant brand damage.
    • Mitigation: Implement a layered defense. Use Llama Prompt Guard 2 to detect attacks like prompt injection and jailbreaking, and Llama Guard 3 to classify the prompt's content against a safety policy. This should be combined with robust prompt engineering that clearly separates trusted instructions from untrusted user input.
  • Preventing Insecure Output Handling (LLM03): Treat all content generated by Llama as untrusted input. A model can be manipulated to generate malicious code (JavaScript, SQL), turning the LLM into a pivot point for attacking downstream systems.
    • Mitigation: Treat all model-generated content as untrusted. Use a safety model like Llama Guard 3 to filter outputs for unsafe content, and always sanitize and validate model outputs before use. For example, validate generated JSON against a strict schema and never pass raw output directly to a system shell or database.
  • Preventing Sensitive Information Disclosure (LLM06): The model may inadvertently reveal sensitive data from its training set or context window.
    • Mitigation: Implement an output-level DLP filter to scan the model's response before it is sent to the user, using either a dedicated DLP service or a safety model like Llama Guard 3 to detect and redact sensitive information.

With strong application-level controls in place, the final layer of defense is to ensure you can detect and respond to threats through continuous monitoring.

Operational security: Maintaining vigilance

Security is a continuous process, not a one-time setup. This final layer of defense transforms your static controls into a living, responsive security program. It ensures you have the visibility and procedures needed to detect, react to, and recover from real-world threats.

Comprehensive audit logging

In regulated environments, the ability to prove what happened is non-negotiable. Your audit logs are the primary source of truth for compliance and forensics.

  • Audit Log Schema: Define a standardized, structured logging format (e.g., JSON). Never log raw prompt or response payloads, which could contain sensitive data. Instead, log metadata and hashes.
  • Integration with SIEM: Stream all audit logs to a centralized Security Information and Event Management (SIEM) system for correlation, real-time analysis, and long-term retention.
// Conceptual Audit Log Schema
{
  "eventId": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "timestamp": "2025-08-29T18:02:00.123Z",
  "eventSource": "LlamaSecurityGateway",
  "eventName": "InvokeModel",
  "actor": {
    "type": "ServiceRole",
    "principalId": "arn:aws:iam::123456789012:role/WebAppServiceRole",
    "sourceIp": "10.0.1.54"
  },
  "resource": {
    "type": "LlamaModel",
    "arn": "arn:aws:sagemaker:us-west-2:123456789012:endpoint/llama-production-endpoint"
  },
  "request": {
    // Log a deterministic, cryptographically secure hash of prompts 
    // to ensure auditability without data leakage.
    "promptHash": "sha256:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2"
  },
  "securityAnalysis": {
    "promptInjectionDetected": false,
    "inputPIIMasked": true
  }
}

Real-time security monitoring and alerting

Use your centralized logs to actively monitor for threats. Create dashboards and alerts for key security events, such as:

  • Authentication Failures: A spike in failed logins could indicate a brute-force attack.
  • Anomalous Usage: Alert on requests from unexpected IP ranges or unusually high-volume activity.
  • Prompt Injection Attempts: Trigger high-priority alerts whenever your input filters detect a potential attack.

Supply chain security

The security of your application depends on the integrity of its components.

  • Verifying Llama Model Artifacts: Always download Llama model weights from official sources. Verify the integrity of the files using the provided checksums to ensure they have not been tampered with.
  • Securing Dependencies: Use Software Composition Analysis (SCA) tools to generate a Software Bill of Materials (SBOM) and continuously scan your dependencies for known vulnerabilities.

Incident response planning

When a security event occurs, a pre-defined plan is critical for a swift response.

  • Develop an LLM-Specific Playbook: Augment your standard incident response plan with playbooks for threats like model jailbreaks or confirmed data leakage events.
  • Isolation and Containment: The incident response plan must include procedures to immediately isolate affected components, such as blocking traffic to a compromised endpoint.
  • Regular Drills:A plan that isn't tested is unlikely to work. Conduct regular tabletop exercises to ensure your teams can execute the plan effectively under pressure, minimizing financial and reputational damage during a real incident.

To help you implement this multi-layered strategy, the following checklist summarizes the key controls discussed in this guide.

Security checklist

Use this checklist to validate the security posture of your Llama deployments against the key controls discussed in this guide.

Infrastructure security

  • [ ] Deploy all components in a private, isolated network (VPC).
  • [ ] Use network microsegmentation to isolate workloads.
  • [ ] Enforce strict ingress/egress firewall rules, denying all outbound internet by default.
  • [ ] Build containers from minimal, hardened base images and run them as non-root users.
  • [ ] Integrate automated vulnerability and IaC scanning into the CI/CD pipeline.
  • [ ] Store the IaC state file in a secure, encrypted remote backend.

Data security

  • [ ] Encrypt all data at rest using Customer-Managed Keys (CMKs).
  • [ ] Enforce TLS 1.3+ and use mTLS for all internal service-to-service communication.
  • [ ] Automate the rotation of all encryption keys.
  • [ ] Apply least-privilege access policies to all encryption keys.
  • [ ] Implement a DLP layer to detect and mask sensitive data in inputs and outputs.

Application security

  • [ ] Architect the system around a central LLM Security Gateway.
  • [ ] Define and enforce granular RBAC roles for users, services, and pipelines.
  • [ ] Use Just-in-Time (JIT) access for all privileged administrative operations.
  • [ ] Implement defenses against prompt injection.
  • [ ] Sanitize and validate all model-generated outputs before use.

Operational security

  • [ ] Implement structured, centralized audit logging for all security-relevant events.
  • [ ] Ensure raw sensitive data is never included in logs.
  • [ ] Integrate logs with a SIEM for real-time monitoring and alerting.
  • [ ] Verify the checksums of all downloaded model artifacts.
  • [ ] Use SCA to continuously scan all software dependencies for vulnerabilities.
  • [ ] Develop and regularly test an incident response plan for LLM-specific threats.

Considerations for Public-Facing Applications

While this guide focuses on securing the backend in a private environment, you may need to expose your application to the public internet. If you do, build upon the secure foundation described here with these additional controls:

  • Use a Hardened Entry Point: All public traffic should enter through a single point, such as a managed Application Load Balancer (ALB) or API Gateway. This entry point should be protected by a Web Application Firewall (WAF) and DDoS mitigation services.
  • Implement Strong Authentication: Protect all endpoints with a robust authentication mechanism like OpenID Connect (OIDC), which builds on the OAuth 2.0 authorization framework and should be handled at the network edge.
  • Apply Rate Limiting: Configure strict rate limiting at your API Gateway or load balancer to prevent abuse and protect your backend services from being overwhelmed.
  • Maintain Network Segmentation: Your public-facing components should reside in a separate, isolated network segment (a DMZ), with strict firewall rules controlling communication with the secure backend.

Additional resources

  • Private Cloud Deployment Guide: For detailed implementation on cloud platforms, see our Private Cloud Deployment Guide.
  • On-Prem deployment for healthcare: For industry-specific compliance controls, review the On-Prem deployment for healthcare.
Was this page helpful?
Yes
No
On this page
Security in production
Introduction
Scope and assumptions
Important concepts
Zero Trust
Least privilege
Defense-in-depth
LLM Data and IP Leakage Vectors
Infrastructure security: Securing the perimeter
Network isolation and segmentation
Secure ingress and egress control
Hardening compute environments
Infrastructure as Code (IaC) security
Data security: Protecting the lifecycle
Encryption at rest
Encryption in transit
Key management lifecycle
Sensitive data detection and masking
Application security: Hardening the stack
Role-Based Access Control (RBAC) patterns
Mitigating LLM-specific threats (OWASP Top 10)
Operational security: Maintaining vigilance
Comprehensive audit logging
Real-time security monitoring and alerting
Supply chain security
Incident response planning
Security checklist
Infrastructure security
Data security
Application security
Operational security
Considerations for Public-Facing Applications
Additional resources