11281 stories
·
34 followers

Options for accessing Llama 3 from the terminal using LLM

1 Share

Llama 3 was released on Thursday. Early indications are that it's now the best available openly licensed model - Llama 3 70b Instruct has taken joint 5th place on the LMSYS arena leaderboard, behind only Claude 3 Opus and some GPT-4s and sharing 5th place with Gemini Pro and Claude 3 Sonnet. But unlike those other models Llama 3 70b is weights available and can even be run on a (high end) laptop!

My LLM command-line tool and Python library provides access to dozens of models via plugins. Here are several ways you can use it to access Llama 3, both hosted versions and running locally on your own hardware.

Llama-3-8B-Instruct locally with llm-gpt4all

If you want to run Llama 3 locally, the easiest way to do that with LLM is using the llm-gpt4all plugin. This plugin builds on the excellent gpt4all project by Nomic AI, providing a quantized (q4) Llama 3 8B instruct model that needs 8GB of RAM and requires a 4.34GB download.

If you don't have LLM installed yet, install it like this:

brew install llm

Or like this:

pipx install llm

Install the plugin:

llm install llm-gpt4all

The first time you run a prompt it will download the model:

llm -m Meta-Llama-3-8B-Instruct "Three great names for a pet emu"

What an interesting question! Here are three name suggestions for your pet emu:

  1. Emmet: A playful take on the word "emu" that suits a friendly and curious bird.
  2. Gizmo: Emus are known for their quirky behavior, so Gizmo is a fitting name to capture their unique personality.
  3. Aussie: As an Australian native species, naming your emu Aussie would be a nod to its heritage.

Remember to choose a name that suits your pet's temperament and characteristics!

Each time you run a command like this the model will be loaded into memory from scratch. To keep it resident in memory, use the llm chat command:

llm chat -m Meta-Llama-3-8B-Instruct

Here's an example session:

Chatting with Meta-Llama-3-8B-Instruct
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> javascript function to reverse a string
%
Here is the JavaScript code that reverses a given string:
```
function reverseString(str) {
  return str.split("").reverse().join("");
}
```

You can use this function by passing your desired input string as an argument, like so:

```javascript
let reversedStr = reverseString("Hello World");
console.log(reversedStr); // Output: "dlroW olleH"
```
In the code above:
- `str.split("")` splits the original string into individual characters.
- `.reverse()` reverses the order of those characters.
- `.join("")` combines them back together as a single string.

This function works for any input string, not just "Hello World". Try it out with different strings to see how it handles punctuation and spaces!
> capitalize it too
%
Here's an updated JavaScript function that not only reverses the input string but also capitalizes each character:
```javascript
function reverseAndCapitalize(str) {
  return str.split("").reverse().map(function(char) { 
    if (char.match(/[a-z]/i)) {
      return char.toUpperCase();
    } else {
      return char;
    }
  }).join("");
}
```

(I'm not sure I'd write reverseAndCapitalize() quite like that, but still a fun example!)

Consult the LLM documentation for more details on how to use the command-line tool.

Fast API access via Groq

Groq serve openly licensed LLMs at ludicrous speeds using their own custom LPU (Language Processing Unit) Inference Engine. They currently offer a free preview of their API: you can sign up and obtain an API key to start using it.

You can run prompts against Groq using their OpenAI compatible API endpoint.

Edit the file ~/Library/Application Support/io.datasette.llm/extra-openai-models.yaml - creating it if it doesn't exist - and add the following lines to it:

- model_id: groq-openai-llama3
  model_name: llama3-70b-8192
  api_base: https://api.groq.com/openai/v1
  api_key_name: groq
- model_id: groq-openai-llama3-8b
  model_name: llama3-8b-8192
  api_base: https://api.groq.com/openai/v1
  api_key_name: groq

This tells LLM about those models, and makes them accessible via those configured model_id values.

Run this command to confirm that the models were registered correctly:

llm models | grep groq

You should see this:

OpenAI Chat: groq-openai-llama3
OpenAI Chat: groq-openai-llama3-8b

Set your Groq API key like this:

llm keys set groq
# <Paste your API key here>

Now you should be able to run prompts through the models like this:

llm -m groq-openai-llama3 "A righteous sonnet about a brave owl"

Animated demo. The sonnet appears in less than a second: Here is a sonnet about a brave owl:  In moonlit skies, a silhouette is seen, A wingspan wide, a watchful, piercing gaze. The owl, a sentinel of secrets keen, Patrols the night, with valor in her ways.  Her feathers soft, a camouflage gray, She glides unseen, a phantom of the night. Her eyes, like lanterns, shining bright and far, Illuminate the darkness, banishing all fright.  Her talons sharp, a grasping, deadly sway, She swoops upon her prey, with silent might. Yet in her heart, a wisdom, old and gray, A fierce devotion to the darkness of the night.  And thus, the owl, a symbol of courage true, Inspires us all, with brave and noble pursuit.  I hope you enjoy this sonnet!

Groq is fast.

There's also a llm-groq plugin but it hasn't shipped support for the new models just yet - though there's a PR for that by Lex Herbert here and you can install the plugin directly from that PR like this:

llm install https://github.com/lexh/llm-groq/archive/ba9d7de74b3057b074a85fe99fe873b75519bd78.zip
llm keys set groq
# paste API key here
llm -m groq-llama3-70b 'say hi in spanish five ways'

Local Llama 3 70b Instruct with llamafile

The Llama 3 8b model is easy to run on a laptop, but it's pretty limited in capability. The 70b model is the one that's starting to get competitive with GPT-4. Can we run that on a laptop?

I managed to run the 70b model on my 64GB MacBook Pro M2 using llamafile (previously on this blog) - after quitting most other applications to make sure the 37GB of RAM it needed was available.

I used the Meta-Llama-3-70B-Instruct.Q4_0.llamafile Q4 version from jartine/Meta-Llama-3-70B-Instruct-llamafile - a 37GB download. I have a dedicated external hard disk (a Samsung T7 Shield) for this kind of thing.

Here's how I got it working:

curl -L -o Meta-Llama-3-70B-Instruct.Q4_0.llamafile 'https://huggingface.co/jartine/Meta-Llama-3-70B-Instruct-llamafile/resolve/main/Meta-Llama-3-70B-Instruct.Q4_0.llamafile?download=true'
# That downloads 37GB - now make it executable
chmod 755 Meta-Llama-3-70B-Instruct.Q4_0.llamafile
# And start it running:
./Meta-Llama-3-70B-Instruct.Q4_0.llamafile

A llamafile is an executable that runs on virtually any platform - see my previous notes on Cosmopolitan and Actually Portable Executable for more on how that works.

This will take quite a while to start, because it needs to load that full 37GB of binary content into memory. Once it's finished loading a local web server becomes available at http://127.0.0.1:8080/ - this serves a web UI you can use to interact with the model, and also serves another OpenAI-compatible API endpoint.

The easiest way to access this from LLM is to install the llm-llamafile plugin:

llm install llm-llamafile

All this plugin does is configure a model called llamafile that attempts to access the model hosted on port 8080. You can run prompts like this:

llm -m llamafile "3 neat characteristics of a pelican"

Here are three neat characteristics of a pelican:

  1. Unique Beak: Pelicans have a distinctive beak that is shaped like a scoop or a basket. This beak is specially designed to catch fish, and it can hold up to 3 gallons of water! The beak is also very sensitive, which helps pelicans detect the presence of fish in the water.

  2. Waterproof Feathers: Pelicans have a special coating on their feathers that makes them waterproof. This is essential for their fishing lifestyle, as they need to be able to dive into the water without getting weighed down by wet feathers. The coating is made up of a waxy substance that helps to repel water.

  3. Pouch-Like Throat: Pelicans have a unique throat pouch that allows them to catch and store fish. When they dive into the water, they use their beak to scoop up fish, and then they store them in their throat pouch. The pouch can expand to hold multiple fish, and the pelican can then swallow the fish whole or regurgitate them to feed their young. This pouch is a key adaptation that helps pelicans thrive in their aquatic environment.

If you don't want to install another plugin, you can instead configure the model by adding this to your openai-extra-models.yaml file:

- model_id: llamafile
  model_name: llamafile
  api_base: http://localhost:8080/v1
  api_key: x

One warning about this approach: if you use LLM like this then every prompt you run through llamafile will be stored under the same model name in your SQLite logs, even if you try out different llamafile models at different times. You could work around this by registering them with different model_id values in the YAML file.

A neat thing about open weight models is that multiple API providers can offer them, encouraging them to aggressively compete on price.

Groq is currently free, but that's with a limited number of free requests.

A number of other providers are now hosting Llama 3, and many of them have plugins available for LLM. Here are a few examples:

  • Perplexity Labs are offering llama-3-8b-instruct and llama-3-70b-instruct. The llm-perplexity plugin provides access - llm install llm-perplexity to install, llm keys set perplexity to set an API key and then run prompts against those two model IDs. Current price for 8b is $0.20 per million tokens, for 80b is $1.00.
  • Anyscale Endpoints have meta-llama/Llama-3-8b-chat-hf ($0.15/million tokens) and meta-llama/Llama-3-70b-chat-hf ($1.0/million tokens) (pricing). llm install anyscale-endpoints, then llm keys set anyscale-endpoints to set the API key.
  • Fireworks AI have fireworks/models/llama-v3-8b-instruct for $0.20/million and fireworks/models/llama-v3-70b-instruct for $0.90/million (pricing). llm install fireworks, then llm keys set fireworks to set the API key.
  • OpenRouter provide proxied accessed to Llama 3 from a number of different providers at different prices, documented on their meta-llama/llama-3-70b-instruct and meta-llama/llama-3-8b-instruct pages (and more). Use the llm-openrouter plugin for those.
  • Together AI has both models as well. The llm-together plugin provides access to meta-llama/Llama-3-8b-chat-hf and meta-llama/Llama-3-70b-chat-hf.

I'm sure there are more - these are just the ones I've tried out myself. Check the LLM plugin directory for other providers, or if a provider emulates the OpenAI API you can configure with the YAML file as shown above or described in the LLM documentation.

That's a lot of options

One key idea behind LLM is to use plugins to provide access to as many different models as possible. Above I've listed two ways to run Llama 3 locally and six different API vendors that LLM can access as well.

If you're inspired to write your own plugin it's pretty simple: each of the above plugins is open source, and there's a detailed tutorial on Writing a plugin to support a new model on the LLM website.

Read the whole story
denubis
6 minutes ago
reply
Share this story
Delete

Catan publisher Asmodee trying to figure out how the Robber put them €900 million in debt

1 Share

Settlers of Catan board game publisher Asmodee has been left scratching their heads after the Robber somehow saddled them with over €900 million of debt. “Normally The Robber is content to take a single resource, but his most recent caper has left us with an enormous deficit,” says Asmodee. The newly independent Asmodee has said that they will be looking to expand their licensing deals in order to pay off the debt. “Embracer Group has a lot of intellectual property in their stable, so thankfully we’ll have plenty of fertile ground to till for a small fee.”

The Robber struck just minutes after the wake of the news that media holding corporation Embracer Group announced that it would split into multiple entities, with Asmodee becoming its own company focused on tabletop games. Spokespeople from the now splintered Embracer Group said they were shocked, just shocked, that the Robber would choose an auspicious day such as this to strike. “It’s unfortunate that this bad luck should hit when our other companies have experienced such a windfall. It appears that when the robber reached us, he was inclined to rid us of our debt.” They shook their heads ruefully. “Who can grasp the mind of a criminal who has been operating since the days of the Neanderthal?”

In a rare break of silence, the Robber denied any part in the sudden rearrangement of debt. “This is why we decided to leave society long ago,” their press release said, which was hammered into our doorframe with a rough-shaped nail. “I’ve never needed more than a little grain or lumber to get me by, with the occasional wool when my clothes need mending. The idea that I would have any need or want of money is ridiculous. Tend to your own houses.”

We would have published this article yesterday, but the Robber did not leave the area until early this morning.

The post Catan publisher Asmodee trying to figure out how the Robber put them €900 million in debt first appeared on The Only Edition.

Read the whole story
denubis
1 hour ago
reply
Share this story
Delete

KPMG declares itself “out of scope” for federal probe into failed IT projects

1 Share

KPMG argues the government makes its own problems as ossified probity and procurement processes only compound longstanding problems.

The post KPMG declares itself “out of scope” for federal probe into failed IT projects appeared first on The Mandarin.

Read the whole story
denubis
1 hour ago
reply
Share this story
Delete

The Radiant Future! (Of 1995)

1 Share

The AI hype in the media obscures the fact that we're clearly in another goddamn venture capital bubble right now.

As the Wall Street Journal said earlier this month (article is paywalled), "... In a presentation earlier this month, the venture-capital firm Sequoia estimated that the AI industry spent $50 billion on the Nvidia chips used to train advanced AI models last year, but brought in only $3 billion in revenue."

On top of that, the industry is running at a loss on power consumption alone, never mind labour costs (which are quite high: those generative LLMs require extensive human curation of the input data they require for training).

So, we've been here before. Most recently with cryptocurrency/blockchain (which is still going on, albeit much less prominently as governments and police go after the most obvious thieves and con men like Sam Bankman Fried).

But there've been other internet-related bubbles before.

I was in on the ground floor of the dot-com boom from 1995-2000, and the hype back then was absolutely bonkers: that may be part of why I'm so thoroughly soured on the current wave of bilge and bullshit. (That, and it's clearly being pumped up by fascist-adjacent straight white males with an unadmitted political agenda, namely to shore up the structures of privilege and entitlement that keep them wealthy.)

The common feature of these bubbles is a shitload of hype and promises from hucksters who fail to deliver a viable product but suck up as much investment capital as they can. A handful of them survive: from dot-com 1.0, the stand-outs are Amazon and Google (Facebook, Twitter, Reddit, et al came along much later—social media was a later, smaller bubble). Other survivors include Paypal, eBay, and Doubleclick (the latter being merged with Google to form a monstrous global advertising monopoly). The survivors tend to leave behind infrastructure: the failures leave behind t-shirts, second hand Aeron chairs, and motivational posters.

If I had more energy I'd be writing a snarky, satirical, 21st century Jetson's style short story right now to highlight the way this plays out. It'd be set in a future where all the dot-com 1.0 hype and promises actually delivered and laid the bedrock of our lives in 2025.

But of course, that's not the story. Instead, the story would explore the unanticipated drawbacks. Starting with "oops, the Amazon drone delivering your neighbour's new dishwasher just fell through your roof; but trades.com only shows you roofers who live in Boston, England, not Boston, MA".

In this shiny dotcom 1.0 future, shoppers always carry their laptop to the supermarket so they can use their CueCat scanner to scan product discount coupon codes off the packaging: they collect the money off vouchers using internet delivered over the supermarket wifi (which blasts them with ads they're forced to click through in return for bandwidth).

The Teledesic satellite network got funded and built out, so you now have 9600 baud global roaming data on your Microsoft Windows CE phone. Which has a fold-out QWERTY keyboard because nobody likes writing on a touch-sensitive screen with a stylus and multitouch was still-born. But your phone calls are secure, thanks to the mandatory built-in Clipper chip.

But Pets dot com just mailed you the third dead and decomposing Rottweiler of the month, instead of the cat food subscription you ordered: the SKUs for Rottie pups and Whiskas are cross-linked in their database, and freight shipping from China takes weeks.

In this gleaming, chromed, Jetsons style future, the Intel Itanium didn't fail, Macs still run on Power architecture, and Microsoft OS/2 4.0 runs everywhere on MIPS, Alpha, and SPARC workstations. Linux is nearly extinct thanks to restrictive embrace-and-extinguish commercial bootloader licensing terms ...

But don't ask about Apple. Oh dear. Oh no. You asked about Apple, didn't you? And why are all those workstations running OS/2?

Solaris never really took over the workstation market; NeXT ate Sun's lunch in the 90s. Today, UNIX research workstations are all featureless black cubes or monoliths and come bundled with Mathematica and FrameMaker. Cheaper RISC-based workstations are all the domain of Microsoft, as are PCs. Apple lives on in a strange twilight: Steve Jobs was unavailable in 1998 (he was tied up buying Oracle), and Apple was not-exactly-saved by buying Be and hiring on Jean-Louis Gassée as their CEO. He staunched the bleeding through strategic alliances, but in the end Gassée had no alternative but to sell Apple to IBM as Big Blue tried to push their Power Architecture down into the realm of business personal computing.

Macintosh® Powerbook™ is all that's left of the glory that was Apple: a range of black plastic PowerPC business laptops sold by Lenovo. Main value proposition: they run COBOL business applications real good. Meanwhile, the UK's Acorn Computers bought what was left of the NewtonOS intellectual property and continues to market the Newton Messagepad series as ruggedized retail and industrial data capture terminals in Europe, using the unique Graffiti text entry system from Palm Computing).

The world of MP3 music players is dominated by Archos. Video is ... well, video as such isn't allowed on the public internet because the MPAA hooked up with the cable TV corporations to force legislation mandating blockers inside all ISPs. Napster does not exist. Bittorrent does not exist. YouTube does not exist. But what passes for video on the internet today is 100% Macromedia Flash, so things could be worse.

So. What survivors from the glorious-future-that-wasn't would you like to memorialize in this shared fictional nightmare?

Read the whole story
denubis
1 hour ago
reply
Share this story
Delete

curl is just the hobby

1 Share

Jan Gampe took things to the next level by actually making this cross-stitch out of the pattern I previously posted online. The flowers really gave it an extra level of charm I think.

This quote is from a comment by an upset user on my blog, replying to one of my previous articles about curl.

Fact check: while curl is my hobby, I also work on curl as a full-time job. It is a business and I serve and communicate with many customers on a daily basis. curl provides service to way more than a billion people. I claim that every human being on the planet that is Internet-connected uses devices or services every day that run curl.

The pattern

curl in San Francisco

Meanwhile, another “curl craft” seen in the wild recently is this ad in San Francisco (photo by diego).

The full command line looks like:

curl --request PUT \
--url https://api.stytch.com/v1/b2b/organizations/{ID} \
-d '{
"mfa_policy": "REQUIRED_FOR_ALL",
"mfa_methods": "RESTRICTED",
"allowed_mfa_methods": ["totp", "sms_otp"]
}'

I would personally perhaps protest against the use of PUT for POSTing JSON, but nobody asked me.

Read the whole story
denubis
1 day ago
reply
Share this story
Delete

https://screenshotsofdespair.tumblr.com/post/748422214765920256

1 Share
Read the whole story
denubis
1 day ago
reply
Share this story
Delete
Next Page of Stories