Table Of Contents
Table Of Contents
Deployment (New)
Private cloud deployment
Production deployment pipelines
Infrastructure migration
Versioning
Accelerator management
Autoscaling
Regulated industry self-hosting
Security in production
Cost projection and optimization
Comparing costs
A/B testing
Deployment (New)
Private cloud deployment
Production deployment pipelines
Infrastructure migration
Versioning
Accelerator management
Autoscaling
Regulated industry self-hosting
Security in production
Cost projection and optimization
Comparing costs
A/B testing
Resources
Meta and Community Resources
A repository of Llama resources from videos to cookbooks.
If you have any feature requests, suggestions, bugs to report we encourage you to report the issue in the respective Github repository.
Note: Some of these resources refer to earlier versions of Llama. However, the concepts and ideas described are still relevant to the most recent version.
Meta Resources
RecipesFor our full list check out the Cookbook page.
How-to-guides
Fine-tuning
Full parameter fine-tuning is a method that fine-tunes all the parameters of all the layers of the pre-trained model.
Quantization
Learn how quantization makes models more efficient for deployment on servers and edge devices.
Prompting
Improve the performance of the language model by providing them with more context and information about the task in handIntegration Guides
Community Resources
Get Started with LlamaUse these cookbooks to get your journey with Llama started.
Fine-tuningDiscover cookbook examples, data sets and more to help you jump start model fine-tuning.
Weights & Biases training and fine-tuning large language models
A course on fine-tuning LLMs.How to fine-tune Llama with LoRA for Question Answering
NVIDIA deep learning blog on fine-tuning Llama.
Performance & LatencyPapers and blogs to help optimize performance and latency.
Improving LLM interfaces
How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.
Improving performance of compressed LLMs with prompt engineering
A paper on improving accuracy-efficiency trade-off of LLM Inference.On this page