Table Of Contents
Deploying Llama 3.2 1B/3B: Partner Guides
The Llama lightweight (1B/3B) models enable developers to bring Llama’s capabilities to mobile and embedded devices.
Meta is collaborating with the following partners to provide guidance and foundational software to use the Llama lightweight models on their device hardware. Browse their offerings below and follow the provided links to obtain more detail.
Arm
Arm CPUs are the foundation for AI everywhere. Delivering generative AI and traditional ML by harnessing the power of Llama 3.2 1B and 3B models across cloud, mobile, and edge devices. Using Arm Kleidi technologies to implement Llama on Arm Cortex and Arm Neoverse CPUs, we are enabling developers to create novel use-cases that deliver efficient and performant AI across the breadth of devices built on Arm.
Arm Kleidi technologies unlock unprecedented out-of-the-box performance for running LLMs everywhere from cloud to edge, enabling acceleration for Llama 3.2 through library integration into AI frameworks.
For mobile and edge ecosystem developers, Llama 3.2 runs efficiently across Arm Cortex CPU based devices. See our documentation for developer resources.
Developers can access Arm from all the major cloud service providers for running Llama 3.2 in the cloud on Arm Neoverse CPU. See our documentation for getting started and visit Arm’s Hugging face page.
MediaTek
MediaTek has collaborated with Meta to support device inference through ExecuTorch APIs, bringing the convenience of open-source, fast device prototyping to our developer community. Browse our ExecuTorch GitHub page for more information.
Developers can port Llama models to GenAI enabled MediaTek products using the MediaTek Neuropilot LLM toolkit. The toolkit supports up to 4-bit quantization, LoRA fine-tuning, advanced graph and cache optimizations, and accelerated decoding techniques that promise best-in-class inference efficiency without noticeable loss of accuracy.
Qualcomm
Qualcomm Technologies, Inc., and Meta share a long-term partnership to support Llama models, including the latest Llama 3.2, running directly on-device. This capability allows developers to save on cloud costs and offer users private, reliable, and personalized experiences on smartphones, PCs, XR headsets, IoT, and automotive. This innovative partnership continuously unlocks new possibilities for on-device AI applications, enabling faster processing, reduced latency, and enhanced efficiency. Visit the Qualcomm AI Hub or download Ollama to learn more and start deploying Llama models on the edge today.