LLAMA 3.1 8B INSTRUCT
Enterprise Consulting Partners
Building a better AI assistant for nuanced knowledge work
At a glance
Industry: Professional services
Use case: Improve existing AI assistant’s semantic understanding, accuracy and speed
Goal: Transform information gathering and synthesis
Llama versions: Llama 3.1 8B Instruct with low-rank adaptation (LoRA)
Deployment: Predibase
25
7%
4
1
*All results are self-reported and not identifiably repeatable. Generally expected individual results will differ.
Helping organizations understand complex situations and move ahead
Enterprise Consulting Partners (ECP) is a pseudonym for a leading professional services firm specializing in risk, strategy and people. The firm helps corporate and public sector leaders navigate an increasingly dynamic environment by addressing the most complex challenges of our time.
THEIR GOAL
Upgrade a confused AI agent with more nuanced language processing
ECP was an early adopter of generative AI, and its first AI assistant transformed how the firm’s analysts and knowledge workers navigated its extensive knowledge bases. However, as ECP integrated the AI assistant into enterprise platforms, it began to struggle with nuanced tasks.
The ECP team explored creating expert models for specific tasks, but fully retraining multiple models introduced unmanageable complexity and was prohibitively expensive.
THEIR SOLUTION
Retool the AI assistant with Llama and LoRA
The ECP team used LoRA to create fine-tuned adaptations of Llama 3.1 8B Instruct. The adaptations load at run time on demand, which allows a single Llama model to generate responses for multiple fine-tuned adaptations at enterprise scale.
With Llama and LoRA, ECP’s AI agent can bring expert AI assistance to unstructured institutional knowledge, current research and enterprise tools. Llama’s superior semantic understanding, multimodal skills and 128K token window are major upgrades to the AI assistant’s base capabilities, while LoRA adapters deliver task- and tool-specific expertise.
THEIR APPROACH
Fine-tune expert agents with less training effort
With fine-tuning, smaller models like Llama 3.1 8B Instruct can exceed the performance of general-purpose models with 10x or 100x the parameters. However, fully retraining smaller models is still a time-consuming and expensive process.
LoRA uses adaptations to modify a fraction of a model’s weights, reducing fine-tuning time 5x to 10x. In production, ECP runs a single Llama 3.1 8B Instruct instance and uses LoRA Exchange (LoRAX) to launch adaptations on demand. LoRAX can spin up multiple fine-tuned LoRA adaptations simultaneously in the same basic computing footprint of Llama 3.1 8B Instruct. The solution can scale up to serve ECP’s global workforce without incurring massive computing bills.
LoRAX serves up fine-tuned adapters on demand, allowing a single model to deliver multi-agent services with a fraction of the computing costs.
THEIR SUCCESS
Expert AI agents save teams over one million hours
The combination of Llama and fine-tuned LoRA adaptations transformed ECP’s AI assistant. It has shouldered an astonishing amount of knowledge work, freeing ECP teams to focus on higher-order thinking and spend more time serving clients.
• 25 million annual queries
• 7% more accurate than GPT-4o mini after fine-tuning
• 4-second round trip time
• 1 million hours saved
Llama’s impressive speed and ability to efficiently manage enterprise-scale queries have made our AI assistant more responsive, while LoRA fine-tuning is increasing its understanding, expertise and accuracy. Our upgraded AI assistant will deliver rapid access to vast stores of institutional knowledge and take on mundane tasks, improving overall operational efficiency.
“Llama’s impressive speed and ability to efficiently manage enterprise-scale queries have made our AI assistant more responsive, while LoRA fine-tuning is increasing its understanding, expertise and accuracy. Our upgraded AI assistant will deliver rapid access to vast stores of institutional knowledge and take on mundane tasks, improving overall operational efficiency.”
Global Chief Information Officer, Enterprise Consulting Partners
Models used
Create generative AI applications for business with open-source large language models that bring unmatched control, customization and flexibility.Llama 3.1 8B Instruct
Stay up-to-date
Our latest updates delivered to your inbox
Subscribe to our newsletter to keep up with the latest Llama updates, releases and more.


