12275 stories
·
35 followers

My Christmas gift: telling you about PurpleMind, which brings CS theory to the YouTube masses

2 Shares

Merry Christmas, everyone! Ho3!

Here’s my beloved daughter baking chocolate chip cookies, which she’ll deliver tomorrow morning with our synagogue to firemen, EMTs, and others who need to work on Christmas Day. My role was limited to taste-testing.

While (I hope you’re sitting down for this) the Aaronson-Moshkovitzes are more of a latke/dreidel family, I grew up surrounded by Christmas and am a lifelong enjoyer of the decorations, the songs and movies (well, some of them), the message of universal goodwill, and even gingerbread and fruitcake.


Therefore, as a Christmas gift to my readers, I hereby present what I now regard as one of the great serendipitous “discoveries” in my career, alongside students like Paul Christiano and Ewin Tang who later became superstars.

Ever since I was a pimply teen, I dreamed of becoming the prophet who’d finally bring the glories of theoretical computer science to the masses—who’d do for that systematically under-sung field what Martin Gardner did for math, Carl Sagan for astronomy, Richard Dawkins for evolutionary biology, Douglas Hofstadter for consciousness and Gödel. Now, with my life half over, I’ve done … well, some in that direction, but vastly less than I’d dreamed.

A month ago, I learned that maybe I can rest easier. For a young man named Aaron Gostein is doing the work I wish I’d done—and he’s doing it using tools I don’t have, and so brilliantly that I could barely improve a pixel.

Aaron recently graduated from Carnegie Mellon, majoring in CS. He’s now moved back to Austin, TX, where he grew up, and where of course I now live as well. (Before anyone confuses our names: mine is Scott Aaronson, even though I’ve gotten hundreds of emails over the years calling me “Aaron.”)

Anyway, here in Austin, Aaron is producing a YouTube channel called PurpleMind. In starting this channel, Aaron was directly inspired by Grant Sanderson’s 3Blue1Brown—a math YouTube channel that I’ve also praised to the skies on this blog—but Aaron has chosen to focus on theoretical computer science.

I first encountered Aaron a month ago, when he emailed asking to interview me about … which topic will it be this time, quantum computing and Bitcoin? quantum computing and AI? AI and watermarking? no, diagonalization as a unifying idea in mathematical logic. That got my attention.

So Aaron came to my office and we talked for 45 minutes. I didn’t expect much to come of it, but then Aaron quickly put out this video, in which I have a few unimportant cameos:

After I watched this, I brought Dana and the kids and even my parents to watch it too. The kids, whose attention spans normally leave much to be desired, were sufficiently engaged that they made me pause every 15 seconds to ask questions (“what would go wrong if you diagonalized a list of all whole numbers, where we know there are only ℵ0 of them?” “aren’t there other strategies that would work just as well as going down the diagonal?”).

Seeing this, I sat the kids down to watch more PurpleMind. Here’s the video on the P versus NP problem:

Here’s one on the famous Karatsuba algorithm, which reduced the number of steps needed to multiply two n-digit numbers from ~n2 to only ~n1.585, and thereby helped inaugurate the entire field of algorithms:

Here’s one on RSA encryption:

Here’s one on how computers quickly generate the huge random prime numbers that RSA and other modern encryption methods need:

These are the only ones we’ve watched so far. Each one strikes me as close to perfection. There are many others (for example, on Diffie-Hellman encryption, the Bernstein-Vazirani quantum algorithm, and calculating pi) that I’m guessing will be equally superb.

In my view, what makes these videos so good is their concreteness, achieved without loss of correctness. When, for example, Aaron talks about Gödel mailing a letter to the dying von Neumann posing what we now know as P vs. NP, or any other historical event, he always shows you an animated reconstruction. When he talks about an algorithm, he always shows you his own Python code, and what happened when he ran the code, and then he invites you to experiment with it too.

I might even say that the results singlehandedly justify the existence of YouTube, as the ten righteous men would’ve saved Sodom—with every crystal-clear animation of a CS concept canceling out a thousand unboxing videos or screamingly-narrated Minecraft play-throughs in the eyes of God.

Strangely, the comments below Aaron’s YouTube videos attack him relentlessly for his use of AI to help generate the animations. To me, it seems clear that AI is the only thing that could let one person, with no production budget to speak of, create animations of this quality and quantity. If people want so badly for the artwork to be 100% human-generated, let them volunteer to create it themselves.


Even as I admire the PurpleMind videos, or the 3Blue1Brown videos before them, a small part of me feels melancholic. From now until death, I expect that I’ll have only the same pedagogical tools that I acquired as a young’un: talking; waving my arms around; quizzing the audience; opening the floor to Q&A; cracking jokes; drawing crude diagrams on a blackboard or whiteboard until the chalk or the markers give out; typing English or LaTeX; the occasional PowerPoint graphic that might (if I’m feeling ambitious) fade in and out or fly across the screen.

Today there are vastly better tools, both human and AI, that make it feasible to create spectacular animations for each and every mathematical concept, as if transferring the imagery directly from mind to mind. In the hands of a master explainer like Grant Sanderson or Aaron Gostein, these tools are tractors to my ox-drawn plow. I’ll be unable to compete in the long term.

But then I reflect that at least I can help this new generation of math and CS popularizers, by continuing to feed them raw material. I can do cameos in their YouTube productions. Or if nothing else, I can bring their jewels to my community’s attention, as I’m doing right now.

Peace on Earth, and to all a good night.

Read the whole story
denubis
9 hours ago
reply
Share this story
Delete

Cooking with Claude

1 Share

I've been having an absurd amount of fun recently using LLMs for cooking. I started out using them for basic recipes, but as I've grown more confident in their culinary abilities I've leaned into them for more advanced tasks. Today I tried something new: having Claude vibe-code up a custom application to help with the timing for a complicated meal preparation. It worked really well!

A custom timing app for two recipes at once

We have family staying at the moment, which means cooking for four. We subscribe to a meal delivery service called Green Chef, mainly because it takes the thinking out of cooking three times a week: grab a bag from the fridge, follow the instructions, eat.

Each bag serves two portions, so cooking for four means preparing two bags at once.

I have done this a few times now and it is always a mad flurry of pans and ingredients and timers and desperately trying to figure out what should happen when and how to get both recipes finished at the same time. It's fun but it's also chaotic and error-prone.

This time I decided to try something different, and potentially even more chaotic and error-prone: I outsourced the planning entirely to Claude.

I took this single photo of the two recipe cards side-by-side and fed it to Claude Opus 4.5 (in the Claude iPhone app) with this prompt:

Extract both of these recipes in as much detail as possible

Two recipe cards placed next to each other on a kitchen counter. Each card has detailed instructions plus photographs of steps.

This is a moderately challenging vision task in that there quite a lot of small text in the photo. I wasn't confident Opus could handle it.

I hadn't read the recipe cards myself. The responsible thing to do here would be a thorough review or at least a spot-check - I chose to keep things chaotic and didn't do any more than quickly eyeball the result.

I asked what pots I'd need:

Give me a full list of pots I would need if I was cooking both of them at once

Then I prompted it to build a custom application to help me with the cooking process itself:

I am going to cook them both at the same time. Build me a no react, mobile, friendly, interactive, artifact that spells out the process with exact timing on when everything needs to happen have a start setting at the top, which starts a timer and persists when I hit start in localStorage in case the page reloads. The next steps should show prominently with countdowns to when they open. The full combined timeline should be shown slow with calculated times tor when each thing should happen

I copied the result out onto my own hosting (you can try it here) because I wasn't sure if localStorage would work inside the Claude app and I really didn't want it to forget my times!

Then I clicked "start cooking"!

The recipe app shows a full timeline with 00:00 Preheat Oven and onwards, plus a big Start Cooking button. In the animation clicking the button starts a timer clicking up, adds a Do this now panel showing the Start all prep work step, shows Coming Up Next with timers counting down to the next steps and updates the full timeline to show local clock times where it previously showed durations from 00:00 upwards.

Here's the full Claude transcript.

There was just one notable catch: our dog, Cleo, knows exactly when her dinner time is, at 6pm sharp. I forgot to mention this to Claude, which had scheduled several key steps colliding with Cleo's meal. I got woofed at. I deserved it.

To my great surprise, it worked. I followed the recipe guide to the minute and served up both meals exactly 44 minutes after I started cooking.

A small bowl (a beautiful blue sea textured bowl, made by Natalie Downe) contains a chickpea stew. A larger black bowl has couscous, green beans and blackened cauliflower.

The best way to learn the capabilities of LLMs is to throw tasks at them that may be beyond their abilities and see what happens. In this case I fully expected that something would get forgotten or a detail would be hallucinated and I'd end up scrambling to fix things half way through the process. I was surprised and impressed that it worked so well.

Some credit for the app idea should go to my fellow hackers at /dev/fort 2 in 2009, when we rented Knockbrex Castle in Dumfries, Scotland for a week and attempted to build a cooking timer application for complex meals.

Generating recipes from scratch

Most of my other cooking experiments with LLMs have been a whole lot simpler than this: I ask for a recipe, ask for some variations and then cook one of them and see what happens.

This works remarkably well considering LLMs have no taste buds.

I've started to think of this as asking LLMs for the average recipe for a dish, based on all of the recipes they have hoovered up during their training. It turns out the mean version of every guacamole recipe on the internet is a decent guacamole!

Here's an example of a recipe I tried recently that worked out really well. I was helping Natalie run her ceramic stall at the farmers market and the stall next to us sold excellent dried beans. I've never used dried beans before, so I took a photo of their selection and asked Claude what I could do with them:

Several bags of tasty looking beans of different varieties and colors More bags of beans.

Identify these beans

It took a guess at the beans, then I said:

Get me excited about cooking with these! If I bought two varietiew what could I make

"Get me excited" switches Claude into a sort of hype-man mode, which is kind of entertaining:

Oh, you're about to enter the wonderful world of bean cooking! Let me get you pumped about some killer two-bean combos: [...]

Mixed bean salad with lemon, olive oil, fresh herbs, cherry tomatoes - light but satisfying [...]

I replied:

OK Bean salad has me interested - these are dried beans. Give me some salad options I can make that would last a long time in the fridge

... and after some back and forth we arrived on the recipe in this transcript, which I cooked the following day (asking plenty of follow-up questions) and thoroughly enjoyed.

I've done this a bunch of times with a bunch of different recipes across both Claude and ChatGPT and honestly I've not had a notable miss yet. Being able to say "make it vegan" or "I don't have coriander, what can I use instead?" or just "make it tastier" is a really fun way to explore cooking.

It's also fun to repeat "make it tastier" multiple times to see how absurd you can get.

I really want someone to turn this into a benchmark!

Cooking with LLMs is a lot of fun. There's an opportunity here for a really neat benchmark: take a bunch of leading models, prompt them for recipes, follow those recipes and taste-test the results!

The logistics of running this are definitely too much for me to handle myself. I have enough trouble cooking two meals at once, for a solid benchmark you'd ideally have several models serving meals up at the same time to a panel of tasters.

If someone else wants to try this please let me know how it goes!

Tags: cooking, devfort, tools, ai, generative-ai, llms, anthropic, claude, vision-llms, vibe-coding

Read the whole story
denubis
3 days ago
reply
Share this story
Delete

2025 was for AI what 2010 was for cloud (xpost)

2 Shares

The satellite, experimental technology has become the mainstream, foundational tech. (At least in developer tools.) (xposted from new home)

I was at my very first job, Linden Lab, when EC2 and S3 came out in 2006. We were running Second Life out of three datacenters, where we racked and stacked all the servers ourselves. At the time, we were tangling with a slightly embarrassing data problem in that there was no real way for users to delete objects (the Trash folder was just another folder), and by the time we implemented a delete function, our ability to run garbage collection couldn’t keep up with the rate of asset creation. In desperation, we spun up an experimental project to try using S3 as our asset store. Maybe we could make this Amazon’s problem and buy ourselves some time?

Why yes, we could. Other “experimental” projects sprouted up like weeds: rebuilding server images in the cloud, running tests, storing backups, load testing, dev workstations. Everybody had shit they wanted to do that exceeded our supply of datacenter resources.

By 2010, the center of gravity had shifted. Instead of “mainstream engineering” (datacenters) and “experimental” (cloud), there was “mainstream engineering” (cloud) and “legacy, shut it all down” (datacenters).

Why am I talking about the good old days? Because I have a gray beard and I like to stroke it, child. (Rude.)

And also: it was just eight months ago that Fred Hebert and I were delivering the closing keynote at SRECon. The title is “AIOps: Prove It! An Open Letter to Vendors Selling AI for SREs”, which makes it sound like we’re talking to vendors, but we’re not; we’re talking to our fellow SREs, begging them to engage with AI on the grounds that it’s not ALL hype.

We’re saying to a room of professional technological pessimists that AI needs them to engage. That their realism and attention to risk is more important than ever, but in order for their critique to be relevant and accurate and be heard, it has to be grounded in expertise and knowledge. Nobody cares about the person outside taking potshots.

This talk recently came up in conversation, and it made me realize—with a bit of a shock—how far my position has come since then.

That was just eight months ago, and AI still felt like it was somehow separable, or a satellite of tech mainstream. People would gripe about conferences stacking the lineup with AI sessions, and AI getting shoehorned into every keynote.

I get it. I too love to complain about technology, and this is certainly an industry that has seen its share of hype trains: dotcom, cloud, crypto, blockchain, IoT, web3, metaverse, and on and on. I understand why people are cynical—why some are even actively looking for reasons to believe it’s a mirage.

But for me, this year was for AI what 2010 was for the cloud: the year when AI stopped being satellite, experimental tech and started being the mainstream, foundational technology. At least in the world of developer tools.

It doesn’t mean there isn’t a bubble. Of COURSE there’s a fucking bubble. Cloud was a bubble. The internet was a bubble. Every massive new driver of innovation has come with its own frothy hype wave.

But the existence of froth doesn’t disprove the existence of value.

Maybe y’all have already gotten there, and I’m the laggard. 😉 (Hey, it’s an SRE’s job to mind the rear guard.) But I’m here now, and I’m excited. It’s an exciting time to be a builder.

Read the whole story
denubis
3 days ago
reply
Share this story
Delete

Quoting Shriram Krishnamurthi

2 Shares

Every time you are inclined to use the word “teach”, replace it with “learn”. That is, instead of saying, “I teach”, say “They learn”. It’s very easy to determine what you teach; you can just fill slides with text and claim to have taught. Shift your focus to determining how you know whether they learned what you claim to have taught (or indeed anything at all!). That is much harder, but that is also the real objective of any educator.

Shriram Krishnamurthi, Pedagogy Recommendations

Tags: teaching

Read the whole story
denubis
5 days ago
reply
Share this story
Delete

Boléro

2 Shares
I perform Maurice Ravel's Boléro on a variety of homemade 8-bit instruments.
Read the whole story
jepler
6 days ago
reply
Earth, Sol system, Western spiral arm
denubis
7 days ago
reply
Share this story
Delete

Your job is to deliver code you have proven to work

4 Shares

In all of the debates about the value of AI-assistance in software development there's one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers - or open source maintainers - and expects the "code review" process to handle the rest.

This is rude, a waste of other people's time, and is honestly a dereliction of duty as a software developer.

Your job is to deliver code you have proven to work.

As software engineers we don't just crank out code - in fact these days you could argue that's what the LLMs are for. We need to deliver code that works - and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.

How to prove it works

There are two steps to proving a piece of code works. Neither is optional.

The first is manual testing. If you haven't seen the code do the right thing yourself, that code doesn't work. If it does turn out to work, that's honestly just pure chance.

Manual testing skills are genuine skills that you need to develop. You need to be able to get the system into an initial state that demonstrates your change, then exercise the change, then check and demonstrate that it has the desired effect.

If possible I like to reduce these steps to a sequence of terminal commands which I can paste, along with their output, into a comment in the code review. Here's a recent example.

Some changes are harder to demonstrate. It's still your job to demonstrate them! Record a screen capture video and add that to the PR. Show your reviewers that the change you made actually works.

Once you've tested the happy path where everything works you can start trying the edge cases. Manual testing is a skill, and finding the things that break is the next level of that skill that helps define a senior engineer.

The second step in proving a change works is automated testing. This is so much easier now that we have LLM tooling, which means there's no excuse at all for skipping this step.

Your contribution should bundle the change with an automated test that proves the change works. That test should fail if you revert the implementation.

The process for writing a test mirrors that of manual testing: get the system into an initial known state, exercise the change, assert that it worked correctly. Integrating a test harness to productively facilitate this is another key skill worth investing in.

Don't be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I've done this myself I've quickly regretted it.

Make your coding agent prove it first

The most important trend in LLMs in 2025 has been the explosive growth of coding agents - tools like Claude Code and Codex CLI that can actively execute the code they are working on to check that it works and further iterate on any problems.

To master these tools you need to learn how to get them to prove their changes work as well.

This looks exactly the same as the process I described above: they need to be able to manually test their changes as they work, and they need to be able to build automated tests that guarantee the change will continue to work in the future.

Since they're robots, automated tests and manual tests are effectively the same thing.

They do feel a little different though. When I'm working on CLI tools I'll usually teach Claude Code how to run them itself so it can do one-off tests, even though the eventual automated tests will use a system like Click's CLIRunner.

When working on CSS changes I'll often encourage my coding agent to take screenshots when it needs to check if the change it made had the desired effect.

The good news about automated tests is that coding agents need very little encouragement to write them. If your project has tests already most agents will extend that test suite without you even telling them to do so. They'll also reuse patterns from existing tests, so keeping your test code well organized and populated with patterns you like is a great way to help your agent build testing code to your taste.

Developing good taste in testing code is another of those skills that differentiates a senior engineer.

The human provides the accountability

A computer can never be held accountable. That's your job as the human in the loop.

Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That's no longer valuable. What's valuable is contributing code that is proven to work.

Next time you submit a PR, make sure you've included your evidence that it works as it should.

Tags: programming, careers, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, vibe-coding, coding-agents

Read the whole story
denubis
7 days ago
reply
Share this story
Delete
Next Page of Stories