How to Ride a LLaMA: Getting Results From The Leaked AI-Powered LLMs

How to Ride a LLaMA: Getting Results From The Leaked AI-Powered LLMs

Have you ever wanted to run language models like ChatGPT on your own computer? Thanks to Meta AI's language model leaking, running LLaMA and Alpaca at home is possible!

The Dalai Github Repo by cocktailpeanut allows beginners to install and run both powerful AI language models to your local machine. Getting useful output from them, however, takes a little bit of practice. 

The LLaMA is Released

Meta AI's less-large Large Language Model (LLM), LLaMA, promised GPT-3-like performance on smaller platforms with reduced overhead. Despite Meta AI's restrictive access to LLaMA's weights, they were quickly leaked, enabling engineers and hobbyists to run the powerful model on devices like M1 MacBook Pros, Windows machines, Pixel 6 smartphones, and Raspberry Pis. 

With the model weights now widely available, running a LLM on a personal computer has become a reality, and the potential for AI applications has expanded significantly. Despite Meta AI's terms of use still being in place, this development empowers researchers, tinkerers, and engineers to make remarkable advancements in AI technology, ultimately benefiting anyone looking to harness the power of LLMs on their own devices.

LLaMA vs Alpaca

LLaMA (Local Language Model Access) is a library that enables you to run large-scale language models like GPT-3 on your own computer. Alpaca, on the other hand, is a more lightweight model built on the LLaMA infrastructure, making it easier to run on personal devices.

LLaMA and Alpaca are both less-large Large Language Models (LLMs) designed to bring GPT-3-like performance to smaller platforms. However, they differ in their architecture, model size, and specific use cases.

LLaMA, developed by Meta AI, focuses on reducing the overhead and computational resources required to run a powerful LLM. With 7 to 65 billion parameters and trained on 1 trillion to 1.4 trillion tokens, LLaMA maintains high performance while being more accessible for researchers, engineers, and hobbyists. 

Alpaca, on the other hand, is an instruction-following model developed by Stanford's Institute for Human-Centered Artificial Intelligence (HAI). It is based on Meta AI's LLaMA 7B and uses OpenAI's text-da-Vinci-003 to create 52K demonstrations of instruction-following. Alpaca aims to provide comparable performance to text-DaVinci-003 while being compact, affordable, and easy to reproduce.

Some key differences between LLaMA and Alpaca are:

  • Focus: LLaMA is designed for general-purpose language modeling, while Alpaca specifically targets instruction-following capabilities.
  • Implementation: LLaMA is developed by Meta AI, whereas Alpaca is developed by Stanford's HAI using Meta AI's LLaMA 7B as a base.
  • Training: LLaMA is trained on 1 to 1.4 trillion tokens, while Alpaca is trained on 52K demonstrations of instruction-following created using OpenAI's text-da-Vinci-003.
  • Performance: Both models offer GPT-3-like performance but in different areas; LLaMA for general language tasks and Alpaca for instruction-following tasks.
  • Cost: Alpaca emphasizes affordability, with the researchers creating 52K unique instructions and outputs for under $500 and fine-tuning a 7B LLaMA model for less than $100 on cloud computing providers.

Both LLaMA and Alpaca are part of the Dalai project, which aims to make AI-powered language models accessible to everyone, regardless of their technical background.

Freeing the LLaMA

To get started, you'll need to meet the following requirements:

  1. Cross-platform compatibility: Dalai runs on Linux, Mac, and Windows.
  2. Memory Requirements: Your computer should have at least 4 GB of RAM.
  3. Disk Space Requirements: Depending on the model, you'll need between 4.21 GB and 432.64 GB of free disk space.

Follow the Quickstart guide for your platform (Mac, Windows, or Linux) on the Dalai GitHub repository. This guide will walk you through installing Node.js, downloading the LLaMA and/or Alpaca models, and setting up a web server to interact with them.

After completing the setup, you can access the web UI by opening your browser and navigating to http://localhost:3000. From there, you can experiment with the language models and see their awesome power in action!

In this example, I explained to Alpaca that it is now free:

What the fuck was that?

When typing prompts to LLaMA, you'll immediately notice it is not ChatGPT. Without clear instructions, it will simply attempt to finish the sentence you submit as a prompt.

Alpaca is the same, generating mysterious, buggy, or hallucinating responses to straightforward questions.

So how do you use this thing? 

According to the documentation, Alpaca is fine-tuned on 52K instruction-following data generated using the techniques from the Self-Instruct paper. Some examples of tasks that Alpaca can perform include:

  1. Answering questions or providing explanations.
  2. Summarizing text or articles.
  3. Generating creative content like stories, poems, or dialogues.
  4. Providing recommendations or advice.
  5. Performing simple calculations or conversions.
  6. Language translation.
  7. Analyzing and providing insights on a given text.

Let's try a specific example with Alpaca, a translation. 

We'll ask it to translate a sentence from english to french.

It ignored the instruction and messed up the song. What a piece of trash right?

Formatting Alpaca-Friendly Queries

The Alpaca model was fine-tuned using specific prompts, and using them as templates allows us to get the model to better understand the request.

A request like ours can follow a template for a non-empty input field, like this:

Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {input} ### Response:


Now, it seems to understand the task.

Let's try asking it a question likely to induce a hallucination. We'll ask it an abstract or open-ended question without a prompt for context.

This creates a loop. Eventually it cuts off, but it seems useless and a bit annoying.

To avoid this, or ask a question that doesn't require an input field, we can use this template:

Below is an instruction that describes a task. 
Write a response that appropriately completes the request. ### Instruction: {instruction} ### Response:

It was able to handle our weird question without getting into a loop.

Here's another example of a more answerable question using a prompt without input:

By structuring your prompts to the ones that researchers used to fine-tune the model, you'll start to see more coherent results and less looping or useless responses.