• Home

  • Productivity

  • App tips

App tips

8 min read

Meta AI: What is Llama and why does it matter?

By Harry Guinness · October 3, 2024
Hero image with the Meta logo

Llama is a family of open large language models (LLMs) and large multimodal models (LMMs) from Meta. It's basically the Facebook parent company's response to OpenAI's GPT and Google's Gemini—but with one key difference: all the Llama models are freely available for almost anyone to use for research and commercial purposes.

That's a pretty big deal, and it's meant that the various Llama models have become incredibly popular with AI developers. Let's explore what Meta's "herd" of Llama models offers.

What is Llama?

Llama is a family of LLMs (and LLMs with vision capabilities, or LMMs) like OpenAI's GPT and Google Gemini. Currently, Meta is at version 3.1 for some models and 3.2 for others, though the numbering system doesn't matter a huge deal—all the models are part of Llama 3. While there are some technical differences between Llama and other LLMs and LMMs, you would really need to be deep into AI for them to mean much. All these AI models were developed and work in essentially the exact same way; they all use the same transformer architecture and development ideas like pretraining and fine-tuning.

When you enter a text prompt or provide a Llama model with text input in some other way, it attempts to predict the most plausible follow-on text using its neural network—a cascading algorithm with billions of variables (called "parameters") that's modeled after the human brain. By assigning different weights to all the different parameters, and throwing in a small bit of randomness, Llama can generate incredibly human-like responses. 

Meta released six Llama 3.1 models in July 2024:

  • Llama 3.1 8B

  • Llama 3.1 8B-Instruct

  • Llama 3.1 70B

  • Llama 3.1 70B-Instruct

  • Llama 3.1 405B

  • Llama 3.1 405B-Instruct

And eight Llama 3.2 models in September 2024:

  • Llama 3.2 1B

  • Llama 3.2 1B-Instruct

  • Llama 3.2 3B

  • Llama 3.2 3B-Instruct

  • Llama 3.2 11B-Vision

  • Llama 3.2 11B-Vision-Instruct

  • Llama 3.2 90B-Vision

  • Llama 3.2 90B-Vision-Instruct

The Llama 3.2 1B and 3B models are designed for on-device inference. That means they're small enough to run directly on modern smartphones and laptops so data never has to be sent to the cloud. This can both speed up processing and improve privacy.

The Llama 3.1 8B and Llama 3.2 11B-Vision models have the same text-generating capabilities—but the 3.2 11B-Vision model adds visual reasoning. This means it can identify images, pull information from charts and graphs, read handwriting, and otherwise work with visual data. It's the same situation with Llama 3.1 70B and Llama 3.2 90B-Vision: Meta has designed it so that developers can directly replace the Llama 3.1 models with the Llama 3.2 models to add vision capabilities to their apps, without changing their text handling abilities. 

The frontier Lllama 3.1 405B model has a whopping 405 billion parameters. While Meta hasn't added visual capabilities yet, it's one of the most powerful open LLMs available

Meta AI: How to try Llama

Meta AI, the AI assistant built into Facebook, Messenger, Instagram, and WhatsApp, uses Llama 3. For most queries, it uses the 90B-Vision model, but for more challenging prompts, you can use the 405B model a few times a day in the dedicated web app.

The Meta AI chat home page

If you aren't in one of the handful of countries where Meta has launched Meta AI, you can demo the 70B-Instruct model and 11B-Vision-Instruct model using HuggingChat, AI repository HuggingSpace's example chatbot.

How does Llama 3 work?

To create its neural network, the Llama 3 models were trained with over 15 trillion "tokens"—the overall dataset was seven times larger than that used to train Llama 2. Some of the data comes from publicly available sources like Common Crawl (an archive of billions of webpages), Wikipedia, and public domain books from Project Gutenberg, while some of it was also "synthetic data" generated by earlier AI models. (None of it is Meta user data.)

Each token is a word or semantic fragment that allows the model to assign meaning to text and plausibly predict follow-on text. If the words "Apple" and "iPhone" consistently appear together, it's able to understand that the two concepts are related—and are distinct from "apple," "banana," and "fruit." According to Meta, Llama 3's tokenizer has a larger vocabulary than Llama 2's, so it's significantly more efficient.

Of course, training an AI model on the open internet is a recipe for racism and other horrendous content, so the developers also employed other training strategies, including reinforcement learning with human feedback (RLHF), to optimize the model for safe and helpful responses. With RLHF, human testers rank different responses from the AI model to steer it toward generating more appropriate outputs. The instruct versions were also fine-tuned with specific data to make them better at responding to human instructions in a natural way.

To add vision to its latest models, Meta trained the image system separately to align it with the existing language models. This means that the 11B-Vision and 90B-Vision models are the same as the existing 8B and 70B Llama models when it comes to text, but they can also take image prompts. It's a clever system as it means that anyone already using Llama in their application doesn't have to account for a dramatically new language model.

Meta has also developed Llama Guard, Prompt Guard, and Llama Code Shield, three safety models designed to prevent Llama models from running harmful prompts or generating insecure computer code.

But all these Llama models are just intended to be a base for developers to build from. If you want to create an LLM to generate article summaries in your company's particular brand style or voice, you can train Llama models with dozens, hundreds, or even thousands of examples and create one that does just that. Similarly, you can configure one of the instruct models to respond to your customer support requests by providing it with your FAQs and other relevant information like chat logs. Or you can just take any Llama model and retrain it to create your own completely independent LLM.

Meta is increasingly creating tools to enable this. Alongside the Llama 3.2 models, it announced Llama Stack, a set of tools and APIs to make developing AI applications with Llama even easier.

Llama vs. GPT, Gemini, and other AI models: How do they compare?

In the Llama 3 research paper, Meta's researchers compare the different models' performance on various benchmarks (like the multi-task language understanding and ARC-challenge common sense logic test) to a handful of equivalent open and proprietary models. The 8B model is compared to Mistral 7B and Gemma 2 9B, while the 70B model is compared to GPT-3.5-Turbo and Mixtral 8x22B. In what can only be called cherry-picked examples, the smaller Llama models are all the top performers. Even still, it's widely accepted that Llama models are competitive with similarly sized models—the 7B model is never going to beat a 70 billion parameter model, but its performance is comparable to other small models.

More interestingly, the 405B model is compared with GPT-4, GPT-4o, and Claude 3.5 Sonnet. While it isn't the top performing model, it is competitive with the current state-of-the-art proprietary models (aside from OpenAI o1) and does well in human evaluated head-to-heads. Meta calls it "the world's largest and most capable openly available foundation model"—which seems like a fair assessment. 

It's the same with the Llama 3.2 models. The lightweight 1B and 3B models compare well to the likes of Gemma 2B and Phi-3.5 mini, while the vision-capable 11B and 90B models perform similarly to Claude 3 Haiku and GPT-4o mini.

In my testing, I've consistently found Llama 3 models to be a big step up from Llama 2. I couldn't get them to hallucinate or just make things up anywhere near as easily. While Meta AI isn't yet replacing ChatGPT for me, the core models are some of the best in the world, and certainly the best you can just download from Hugging Face right now.

Why Llama matters

Most of the LLMs you've heard of—OpenAI's o1 and GPT-4o, Google's Gemini, Anthropic's Claude—are all proprietary and closed source. Researchers and businesses can use the official APIs to access them and even fine-tune versions of their models so they give tailored responses, but they can't really get their hands dirty or understand what's going on inside.

With Llama, though, you can download the model right now, and as long as you have the technical chops, get it running on your computer or even dig into its code. (Though be warned: even small LLMs are measured in GBs.) Meta has also published a monster research paper that details how the full "Llama 3 herd" of models were trained, the architecture they use, how they compare to other models, the steps Meta is taking to make them safe, and lots of more fascinating details…if you're the kind of person who's into these things.

And much more usefully, you can also get it running on Microsoft Azure, Google Cloud, Amazon Web Services, and other cloud infrastructures so you can operate your own LLM-powered app or train it on your own data to generate the kind of text you need. Just be sure to check out Meta's guide to responsibly using Llama—the license isn't quite as permissive as a traditional open source license.

Still, by continuing to be so open with Llama, Meta is making it significantly easier for other companies to develop AI-powered applications that they have more control over—as long as they stick to the acceptable use policy. The only other big limits to the license are that companies with more than 700 million monthly users have to ask for special permission to use Llama, so the likes of Apple, Google, and Amazon have to develop their own LLMs.

In a letter accompanying Llama 3.1's release, CEO Mark Zuckerberg is incredibly transparent about Meta's plans to keep Llama open:

"I believe that open source is necessary for a positive AI future. AI has more potential than any other modern technology to increase human productivity, creativity, and quality of life—and to accelerate economic growth while unlocking progress in medical and scientific research. Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn't concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society."

And really, that's quite exciting. Sure, Meta will benefit by being somewhat in the driving seat of one of the most important AI models. But independent developers, companies that don't want to be locked into a closed system, and everyone else interested in AI will benefit. So many of the big developments in computing over the past 70 years have been built on top of open research and experimentation, and now AI looks set to be one of them. While Google, OpenAI, and Anthropic are always going to be players in the space, they won't be able to build the kind of commercial moat or consumer lock-in that Google has in search and advertising. 

By letting Llama out into the world, there will likely always be a credible alternative to closed source AIs.

Related reading:

  • What is Google Gemini?

  • What is Sora? OpenAI's text-to-video model

  • What is Perplexity AI?

  • Meta AI vs. ChatGPT: Which is better?

  • What is open source AI?

This article was originally published in August 2023. The most recent update was in October 2024.

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.

tags

Related articles

Improve your productivity automatically. Use Zapier to get your apps working together.

Sign up
See how Zapier works
A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'