View Only

🦙🦙🦙🚀 Llama 3 is now available in watsonx.ai!

By MARYAM ASHOORI posted Thu April 18, 2024 12:30 PM


I’m excited to announce the availability of Meta’s Llama 3 — the next generation of Meta’s open large language model — on our watsonx.ai platform. 

Starting today, the Llama 3 8B and Llama 3 70B models are generally available on watsonx.ai. They join previous models in the Llama family, Llama 2 7B, Llama 2 70B, and CodeLlama 34B. A self-deployed release of the Llama 3 series is coming soon that will allow you to deploy these models on the platform of your choice (multi-cloud or on-premises, with no vendor lock-in).

💡 Quick facts about Llama 3 models:

  • Llama3 8B and 70B are available today on watsonx.ai. We are excited to have these models available the same day they were released!
  • Llama 3 8B is ideal for environments with limited computational power or resources (e.g. edge devices)
  • Llama 3 70B is suited for content creation, conversational AI, language understanding, and enterprise applications
  • Impressive performance: MMLU (5-shot) for instruction-tuned Llama3 70B is 82.0 and for Llama3 8B is 68.4 
  • Both models are licensed for commercial use 
  • Stats: 15T+ training tokens, 8K context length, new efficiencies in tokenization that improves performance on English and multilingual benchmarks
  • IBM offers competitive pricing on Llama 3 models: Llama 3 8B is $0.60 per 1M tokens, Llama 3 70B is $1.81 per 1M tokens.

<pricing as of April 18, 2024>

Over the past few months, Llama 2 models have been extensively adopted by IBM customers for summarization, classification, information extraction, content-grounded question answering, and content generation. As an example, the Recording Academy — the non-profit that hosts the GRAMMYs — tuned Llama 2 to produce digital content consistent with their brand’s standards and tone of voice.

👉 Getting started with Llama 3 on watsonx.ai

Watsonx.ai provides a no-code environment – Prompt Lab – to explore the capabilities of Llama 3 models. Pro users can build Llama 3-powered applications using our APIs and SDKs.

To get started, open the Prompt Lab from the IBM watsonx.ai home page.
Select “view all foundation models” from the drop down menu on the top right corner to open up the model library.


Select the Llama 3 model you would like to experiment with (8B or 70B).

You can also experiment with other models in the Llama family, including Llama 2 13B, Llama 2 70B, and CodeLlama 34B. For each model in the model library, you can view model cards with details of intended use cases, hardware and software considerations, benchmark evaluation results, and ethical considerations and limitations.

If you have created your own model based on the Llama 2 architecture, you can deploy it within watsonx.ai via Bring Your Own Model (BYOM). This capability allows you to experiment and deploy tens of thousands of tuned Llama 2 models, including open-source models, domain-specific models, and non-English language models.

Once selected the variant, you can experiment with your selected Llama 3 model in three modes:

1- Structured mode that gives you a range of sample prompts for summarization, classification, generation, extraction, and question answering to choose from.

    2- An interactive chat mode

    3- Freeform that allows you to experiment with input/output formats of your choice. We have provided “freeform” sample prompts for a number of use cases like finance Q&A or thank you note generation to get you started.

    In addition to prompt flexibility, you can adjust the model parameters to tune the behavior of the model:

    • Decoding method: Set decoding to “Greedy” to always select words with the highest probability and set it to “Sampling” to customize the variability of word selection.
    • Stopping Criteria: You can control when to stop generating output by specifying stop sequences, setting the approximate number of words in the generated text, and specifying how much repetition is allowed.
    • Input/Output token length

    Once happy with the customizations, you can either grab the curl command, the node.js code, or the Python code to continue the customization in your environment of choice, or stick to the no-code environment, save the prompt, or share the prompt across your organization for further tuning and customizations before you are ready to deploy in production.

    Prompt saving in watsonx.ai


    Select llama variants are available in our Tuning Studio or tuning through jupyter notebooks for further tuning.

    tuning studio in watsonx.ai


    Deploy your tuned Llama 3 on your platform of choice

    In addition to the SaaS offering of llama variants in watsonx.ai, llama 2 variants are available today for multi-cloud deployments (on AWS, Azure, or GCP) as well as on premises. A self-deployed release of the Llama 3 series is coming soon that will allow you to deploy these models on the platform of your choice (multi-cloud or on-premises, with no vendor lock-in).

    We are committed to continue bringing the best of open models to our enterprise customers and help them pick the right model for their target use cases. We’re excited to see what you build with Llama 3!

    📣 Shoutout to the watsonx.ai team for making the model available on the same day as its public release.

    Get started with Llama 3 in watsonx.ai

    • Sign up for watsonx.ai for free to explore the capabilities of Llama 3 models.
    • Learn more about the partnership between Meta and IBM and the AI Alliance established with more than 50 leading organizations globally to work together on creating actionable plans that advance responsible and inclusive AI that’s rooted in open innovation.



    1 comment



    Fri April 19, 2024 11:22 AM

    Can't wait to try these models on watsonx.ai