Table Of Contents
Llama Models
Llama 3.1 Enhancements
Llama 3.1 represents Meta's most capable model to date, including enhanced reasoning and coding capabilities, multilingual support, and an all-new reference system. Check out the following videos to see some of these new capabilities in action.
Prompt Formatting
To correctly prompt each Llama model, please closely follow the formats described in the following sections. Keep in mind that when specified, newlines must be present in the prompt sent to the tokenizer for encoding. For details on implementing code to create correctly formatted prompts, please refer to the linked file for each model version.
Upgrading your application from Llama 3 to Llama 3.1
For many cases where an application is using a Hugging Face (HF) variant of the Llama 3 model, the upgrade path to Llama 3.1 should be straightforward.
Changes to the prompt format—such as EOS tokens and the chat template—have been incorporated into the tokenizer configuration which is provided alongside the HF model.
As a demonstration, an example of inference logic is provided, which works equivalently with the Llama 3 and Llama 3.1 versions of the 8B Instruct model.
Running the script without any arguments performs inference with the Llama 3 8B Instruct model. Passing the following parameter to the script switches it to use Llama 3.1.
--model-id "meta-llama/Meta-Llama-3.1-8B-Instruct Looking at the code, you can see that the tokenizer handles all the necessary changes to run the new model.
Llama 3.1 provides significant new features, including function calling and agent-optimized inference (see the Llama Agentic System for examples of this). However, for the case where a developer simply wants to take advantage of the updated model, a drop-in replacement is possible.