LLAMA 3.1 8B INSTRUCT

Enterprise Consulting Partners

News card image
CASE STUDY

Building a better AI assistant for nuanced knowledge work

At a glance

Industry: Professional services

Use case: Improve existing AI assistant’s semantic understanding, accuracy and speed

Goal: Transform information gathering and synthesis

Llama versions: Llama 3.1 8B Instruct with low-rank adaptation (LoRA)

Deployment: Predibase

25
million annual queries
7%
more accurate than GPT‑4o mini after fine-tuning
4
second round trip time
1
million hours saved

*All results are self-reported and not identifiably repeatable. Generally expected individual results will differ.

THEIR STORY

Helping organizations understand complex situations and move ahead

Enterprise Consulting Partners (ECP) is a pseudonym for a leading professional services firm specializing in risk, strategy and people. The firm helps corporate and public sector leaders navigate an increasingly dynamic environment by addressing the most complex challenges of our time.

THEIR GOAL

Upgrade a confused AI agent with more nuanced language processing

ECP was an early adopter of generative AI, and its first AI assistant transformed how the firm’s analysts and knowledge workers navigated its extensive knowledge bases. However, as ECP integrated the AI assistant into enterprise platforms, it began to struggle with nuanced tasks.

The ECP team explored creating expert models for specific tasks, but fully retraining multiple models introduced unmanageable complexity and was prohibitively expensive.

THEIR SOLUTION

Retool the AI assistant with Llama and LoRA

The ECP team used LoRA to create fine-tuned adaptations of Llama 3.1 8B Instruct. The adaptations load at run time on demand, which allows a single Llama model to generate responses for multiple fine-tuned adaptations at enterprise scale.

With Llama and LoRA, ECP’s AI agent can bring expert AI assistance to unstructured institutional knowledge, current research and enterprise tools. Llama’s superior semantic understanding, multimodal skills and 128K token window are major upgrades to the AI assistant’s base capabilities, while LoRA adapters deliver task- and tool-specific expertise.

THEIR APPROACH

Fine-tune expert agents with less training effort

With fine-tuning, smaller models like Llama 3.1 8B Instruct can exceed the performance of general-purpose models with 10x or 100x the parameters. However, fully retraining smaller models is still a time-consuming and expensive process.

LoRA uses adaptations to modify a fraction of a model’s weights, reducing fine-tuning time 5x to 10x. In production, ECP runs a single Llama 3.1 8B Instruct instance and uses LoRA Exchange (LoRAX) to launch adaptations on demand. LoRAX can spin up multiple fine-tuned LoRA adaptations simultaneously in the same basic computing footprint of Llama 3.1 8B Instruct. The solution can scale up to serve ECP’s global workforce without incurring massive computing bills.

their solution graphic

LoRAX serves up fine-tuned adapters on demand, allowing a single model to deliver multi-agent services with a fraction of the computing costs.

THEIR SUCCESS

Expert AI agents save teams over one million hours

The combination of Llama and fine-tuned LoRA adaptations transformed ECP’s AI assistant. It has shouldered an astonishing amount of knowledge work, freeing ECP teams to focus on higher-order thinking and spend more time serving clients.

Projected results:

    • 25 million annual queries

    • 7% more accurate than GPT-4o mini after fine-tuning

    • 4-second round trip time

    • 1 million hours saved

*All results are self-reported and not identifiably repeatable. Generally expected individual results will differ.
Llama’s impressive speed and ability to efficiently manage enterprise-scale queries have made our AI assistant more responsive, while LoRA fine-tuning is increasing its understanding, expertise and accuracy. Our upgraded AI assistant will deliver rapid access to vast stores of institutional knowledge and take on mundane tasks, improving overall operational efficiency.
“Llama’s impressive speed and ability to efficiently manage enterprise-scale queries have made our AI assistant more responsive, while LoRA fine-tuning is increasing its understanding, expertise and accuracy. Our upgraded AI assistant will deliver rapid access to vast stores of institutional knowledge and take on mundane tasks, improving overall operational efficiency.”

Global Chief Information Officer, Enterprise Consulting Partners

Models used

Create generative AI applications for business with open-source large language models that bring unmatched control, customization and flexibility.
Start building
applications icon
Text

Llama 3.1 8B Instruct

Pretrained, instruction-tuned generative model
Optimized for multilingual dialogue use cases
*Licensed under Llama 3.1 Community License Agreement
Download models
Horizon banner image

Stay up-to-date

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with the latest Llama updates, releases and more.

Explore more
Dec 18, 2025
Tinfoil
Keeping data and models private, even when running in the cloud
Tech
Read story
Dec 10, 2025
Stoque
Unifying internal knowledge for faster insights and efficiency
Tech
Read story
Nov 14, 2025
Contextual AI
Reducing hallucinations with a better context layer for AI agents
Tech
Read story