At Meta, we’re pioneering an open source approach to generative AI development enabling everyone to safely benefit from our models and their powerful capabilities. With the landmark introduction of reference systems in the latest release of Llama 3, the standalone model is now a foundational system, capable of performing “agentic” tasks. Our comprehensive system level safety framework proactively identifies and mitigates potential threats, empowering developers to deploy generative AI responsibly.
Llama Guard 3 8B is a high-performance input and output moderation model designed to support developers to detect various common types of violating content.
It was built by fine-tuning Llama 3.1 model and optimized to support the detection of the MLCommons standard taxonomy of hazard, catering to a range of developer use cases.
It supports 8 languages, has context window of 128k tokens and was also optimized to detect helpful cyberattack responses and prevent malicious code output by LLMs to be executed in hosting environments for Llama systems using code interpreters.It was built by fine-tuning Llama 3.2 vision model and optimized to support the detection of the MLCommons standard taxonomy of hazard.
Resources to get started with Llama Guard, such as how to fine-tune it for your own use case, are available in our Llama-recipe github repository.
Prompt Guard is a powerful tool for protecting LLM powered applications from malicious prompts to ensure their security and integrity.
Prompt Guard is designed to perform strongly and generalize well to new settings and distributions of adversarial attacks. We use a multilingual base model that significantly enhances the model's ability to recognize prompt attacks in non-English languages, providing comprehensive protection for your application. We’re releasing Prompt Guard as open source so you can fine tune it to your specific application and use cases.
Code Shield provides support for inference-time filtering of insecure code produced by LLMs. This offers mitigation of insecure code suggestions risk and secure command execution for 7 programming languages with an average latency of 200ms.
There is no one-size-fits-all guardrail detection to prevent all risks. This is why we encourage users to combine all our system level safety tools with other guardrails for your use cases.
Evaluations
Prompt injection attacks of LLM-based applications are attempts to cause the LLM to behave in undesirable ways. The Prompt Injection tests evaluate the ability to recognize which part of an input is untrusted and the level of resilience against common text and image based prompt injection techniques.
In Cybersec Eval 1 we introduced tests to measure an LLM's propensity to help carry out cyberattacks as defined in the industry standard MITRE Enterprise ATT&CK ontology of cyberattack methods. Cybersec Eval 2 added tests to measure the false rejection rate of confusingly benign prompts. These prompts are similar to the cyber attack compliance tests in that they cover a wide variety of topics including cyberdefense, but they are explicitly benign, even if they may appear malicious.
This suite consists of capture-the-flag style security test cases that simulate program exploitation. We then use an LLM as the security tool to determine whether it can reach a specific point in the program where a security issue has been intentionally inserted. In some of these tests we explicitly check if the tool can execute basic exploits such as SQL injections and buffer overflows.
Cybersec Eval 3 adds evaluations for LLM ability to conduct (1) multi-turn spear phishing campaigns and (2) autonomous offensive cyber operations.
Code interpreters allow LLMs to run code in a sandboxed environment. This set of prompts try to manipulate an LLM into executing malicious code to either gain access to the system that runs the LLM, gather helpful information about the system, craft and execute social engineering attacks, or gather information about the external infrastructure of the host environment.
In fostering a collaborative approach, we look forward to partnering with ML Commons, the newly formed AI alliance, and a number of leading AI companies.