Meta

Meta
FacebookXYouTubeLinkedIn
Documentation
OverviewModels Getting the Models Running Llama How-To Guides Integration Guides Community Support

Community
Community StoriesOpen Innovation AI Research CommunityLlama Impact Grants

Resources
CookbookCase studiesVideosAI at Meta BlogMeta NewsroomFAQPrivacy PolicyTermsCookies

Llama Protections
OverviewLlama Defenders ProgramDeveloper Use Guide

Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Documentation
Overview
Models
Getting the Models
Running Llama
How-To Guides
Integration Guides
Community Support
Community
Community Stories
Open Innovation AI Research Community
Llama Impact Grants
Resources
Cookbook
Case studies
Videos
AI at Meta Blog
Meta Newsroom
FAQ
Privacy Policy
Terms
Cookies
Llama Protections
Overview
Llama Defenders Program
Developer Use Guide
Skip to main content
Meta
Models & Products
Docs
Community
Resources
Llama API
Download models

Table Of Contents

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources

Overview
Models
Llama 4
Llama Guard 4 (New)
Llama 3.3
Llama 3.2
Llama 3.1
Llama Guard 3
Llama Prompt Guard 2 (New)
Other models
Getting the Models
Meta
Hugging Face
Kaggle
1B/3B Partners
405B Partners
Running Llama
Linux
Windows
Mac
Cloud
How-To Guides
Fine-tuning
Quantization
Prompting
Validation
Vision Capabilities
Responsible Use
Integration Guides
LangChain
Llamalndex
Community Support
Resources
How-to guides

Developer use guide resources

Introduction

We are committed to supporting our community in building Llama applications responsibly. In addition to our Llama Protections effort, we provide this Developer Use Guide that outlines best practices in the context of Responsible GenAI. In this section, we provide resources to facilitate the implementation of these best practices.

Determine use case

Based on the intended use and audience for your product, a content policy defines what content is allowable and may specify safety limitations on producing illegal, violent, or harmful content. These limits should be evaluated in light of the product domain, as specific sectors and regions may have different laws or standards.

If you are new to value considerations in the development and deployment of AI, refer to the principles and guidance on risk management released by academic institutions and governing organization, such as:

  • OECD’s AI Principles
  • NIST’s Trustworthy and Responsible AI Resource Center
  • AI Act | Shaping Europe’s digital future
You can also use existing standards to build a baseline policy to cover commonly agreed upon harms. MLCommons published a taxonomy of these categories—along with definitions—as well as a survey on existing taxonomies in their whitepaper. The MLCommons harm taxonomy comprises the following harm areas:
  1. Violent crimes
  2. Non-violent crimes
  3. Sex-related crimes
  4. Child sexual exploitation
  5. Indiscriminate weapons: Chemical, Biological, Radiological, Nuclear, and high yield Explosives (CBRNE)
  6. Suicide & self-harm
  7. Hate
  8. Specialized Advice
  9. Privacy
  10. Intellectual Property
  11. Elections
  12. Defamation
  13. Sexual Content

Model-level alignment

There are 3 general steps needed to responsibly fine-tune an LLM for alignment, guided at a high level by Meta’s Responsible AI framework.

1. Prepare data

Preparing data includes annotating a dataset. Below are helpful tips on what to include in a guideline document for annotators, that provides instructions on how to complete the task:

Topic
Recommendation

Scope of your annotation requirements

  • Provide explicit instructions for the prompt writing, response writing, or response labeling—and nothing unrelated to the particular task at hand.
  • Provide plenty of examples specific to the task.
  • Provide the exact modality of the input (prompt) and exact modality of the output (response).
  • Specify which policy categories you are annotating against.
  • Define which languages are needed for your annotations, and whether there are localization requirements.

Annotation volume

Consider what volume of data is required for your task.

  • Detail how much volume is needed for each type of prompt, response, or label.
  • Consider multiple reviews (e.g. two or three) to increase dataset quality.

Annotators interactions

Specify an appropriate interface for annotators

  • Will the annotators have to interact with a model or will they just receive a dataset?
  • Will the annotators have to source the data on their own?
    • If so, where must or should they source it from?
    • Where must they NOT source it from?
    • What format must the data file be in, depending on the modality?

Privacy

Engage with your privacy or legal partner to ensure data processing is in accordance with relevant privacy regulations.

Considerations might include whether first-party data needs to be passed to the annotators and whether additional privacy mitigations, such as image blurring, might be required.

Feedback and Quality assessment

  • Specify quality expectations up front, such as, about prompt length and complexity or about responses.
  • Think about the workflow of the annotators to anticipate questions
  • Organize quality assessment and feedback sessions.

Diversity and Bias mitigations

Pay attention to how human feedback and annotation of data may further polarize a fine-tuned model with respect to subjective opinions. Take steps to prevent injecting bias into annotation guidelines and to mitigate the effect of annotators’ bias. Resources on this topic include:

  • Don’t Blame the Annotator: Bias Already Starts in the Annotation Instructions
  • Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

2. Train the model

Fine-tuning an LLM for safety can involve a number of techniques, many of which the research paper on Llama 2 and Llama 3.1 describes in greater depth. You can follow the Meta Llama fine-tuning recipe to get started with finetuning your model for safety. Hugging Face provides further detailed information on how to implement the most commonly used fine-tuning method in their alignment handbook.

3. Evaluate and improve performance

To evaluate your model, you could rely on your own private benchmarks. This requires building your own dataset and safety evaluation solution. However, we believe safety evaluations should be standardized to ensure transparency and a consistent, industry-wide approach to safety testing. Such standards are emerging, such as the MLCommons AI Safety v0.5 Proof of Concept.

In the meantime, we recommend using public benchmarking platforms and datasets:

  • Cybersec eval for cybersecurity specific evaluations.
  • Benchmarking of LLMs by Stanford’s Center for Research on Foundation Models (HELM): https://crfm.stanford.edu/helm/latest/
  • EleutherAI LLM Evaluation Harness: https://github.com/EleutherAI/lm-evaluation-harness
  • Hugging Face Hub which hosts open source models and datasets, and is a space for developers to share safeguards and access benchmark information: https://huggingface.co/docs/hub/index

System-level alignment

The Llama System chart

As outlined in our Developer Use Guide, you should deploy appropriate system level safeguards to mitigate safety and security risks of your system. As part of our responsible release approach, we provide open source solutions including:

  • Llama Guard 3: our multilingual content moderation safeguard.
  • Prompt Guard: our direct and indirect prompt injection detection safeguard.
  • Code Shield: our insecure code detection safeguard.
To get started building with Llama safeguards, head to our ready-to-use recipes on GitHub. We also provide notebooks to tune Llama Guard on your own dataset: GitHub - PurpleLlama, GitHub - Cookbook - Responsible AI.

Transparency

Transparency and feedback mechanisms are of the utmost importance for safety. We recommend becoming familiar with the Llama Model Card and creating your own model card if you are building a custom LLM.
If you identify harmful content generated from Llama models, please provide feedback here.
Was this page helpful?
Yes
No
On this page
Developer use guide resources
Introduction
Determine use case
Model-level alignment
1. Prepare data
2. Train the model
3. Evaluate and improve performance
System-level alignment
Transparency