Llama 3.1 405B Instruct
Databricks
Delivering synthetic training data faster and cheaper to fuel Spark SQL code generation
At a glance
Industry: Technology
Use case: Generating synthetic data for AI code generation
Goal: Create accurate synthetic datasets to improve model proficiency
Llama versions: Llama 3.1 405B Instruct
Deployment: Databricks Model Serving with Databricks Notebooks managing outputs
8%
*All results are self-reported and not identifiably repeatable. Generally, expected individual results will differ.
The leader in data and AI solutions
Databricks helps global enterprises take control of their data and put it to work with AI. Used by more than 10,000 organizations worldwide and over 60% of the Fortune 500, the Databricks Data Intelligence Platform provides a unified, open analytics platform for building, deploying, sharing and maintaining enterprise-grade data, analytics and AI solutions at scale.
THEIR GOAL
Enhancing real-time Spark SQL code gen for Databricks Assistant Autocomplete
The Databricks Assistant Autocomplete tool produces personalized AI-generated code suggestions in real time. To improve the proficiency of Assistant Autocomplete in Spark SQL — a critical use case for Databricks — the Databricks Applied AI team needed to find a scalable and comprehensive way to test. They sought an LLM that excelled at creating accurate synthetic data, demonstrated code comprehension capabilities and also fit their price point.
THEIR SOLUTION
Llama real-time synthetic data for code generation and training
The Applied AI team leveraged Llama 3.1 405B Instruct to create substantial synthetic training and evaluation datasets. The team was impressed with Llama’s ease of use and integration, open-source licensing and robust code comprehension.
THEIR APPROACH
Llama deployment flexibility delivers ease and speed
With access to Llama through Foundation Model APIs, Databricks was able to easily deploy through Databricks Model Serving, managing outputs with Databricks notebooks. This approach significantly simplified integration and accelerated time to results.
When Databricks developers compared the Llama model's outputs to those of other available state-of-the-art (SOTA) models, they found that Llama met or exceeded the outputs of competing models at a much lower price point.
THEIR SUCCESS
Synthetic data enables improvement in performance and quality of code gen for Spark SQL
Retraining the Assistant Autocomplete model with Llama-generated synthetic data resulted in an 8% improvement in the performance and quality of the model’s outputs.
With better performing outputs, Databricks Assistant Autocomplete now provides more accurate and highly relevant code suggestions, delivering significant improvements in productivity for Databricks’ customers.
• 8% improvement in performance and quality of fine-tuned model outputs compared to non-fine-tuned model
Using Llama was a win-win for us and eliminated per-token inference costs,” says a member of the Databricks Applied AI team. “And using an open-source model gave us greater freedom, flexibility and control over the model and IP of any subsequent products created.
“Using Llama was a win-win for us and eliminated per-token inference costs,” says a member of the Databricks Applied AI team. “And using an open-source model gave us greater freedom, flexibility and control over the model and IP of any subsequent products created.”
Models used
Create generative AI applications for business with open-source large language models that bring unmatched control, customization and flexibility.Llama 3.1 405B
Stay up-to-date
Our latest updates delivered to your inbox
Subscribe to our newsletter to keep up with the latest Llama updates, releases and more.