12277 stories
·
35 followers

2025 letter

1 Share

bristol

People love to predict the future. Back in June, Peter Thiel recounted a conversation he had with Elon Musk. “[Elon] said we’re going to have a billion humanoid robots in the U.S. in 10 years. And I said: Well, if that’s true, you don’t need to worry about the budget deficits because we’re going to have so much growth, the growth will take care of this. And then—well, he’s still worried about the budget deficits.” That doesn’t mean Musk is lying or wrong, Thiel adds. “But yeah, there’s some way in which these things are not quite thought through.”

I’m all for thinking through our imminent future with AI, and that’s what I’m going to do for the rest of this letter. But before I think about the next 10 years, I want to reflect on the last 10. What would you have had to account for, predicting 2025 from 2015?

Steep yourself in the last months of 2015. The program AlphaGo has just defeated the European champion in Go, the first time a computer has ever beaten a human professional. The 18-time world champion Lee Sedol would be quite unmoved, remarking that “the level of the player that AlphaGo went against in October is not the same level as me.” (AlphaGo would defeat Lee the following March.) Elsewhere in the world, there is war in Ukraine. SpaceX landed its Falcon 9 for the first time. Star Wars: The Force Awakens premiered. A new startup called OpenAI was founded.

If you think it through, you won’t be surprised to learn that in 2025, programs using natural language, under human time controls, will win gold medals at the International Math Olympiad and the International Collegiate Programming Contest. Computers will be competing there because they’ll have already passed the bar and the MCAT years ago. Lawyers and doctors will still have their jobs, though, and global real growth will be gently slowing. Coding agents will have changed that job forever. A Chinese firm will draw comparisons to Sputnik. A new pope will name himself after Leo XIII, explicitly to respond to “another industrial revolution and to developments in the field of artificial intelligence.”

In your whole and consistent view, you’ll point out that AI data center investment will account for over 90 percent of U.S. GDP growth in the first half of 2025. A few companies everybody knows will together spend more than the cost of the entire Apollo program, inflation adjusted, in 10 months. Yet in the general public, opposition to data centers will shape up to be a new bipartisan rallying cry. That will partly be because of a frozen labor market, with neither hiring nor firing, though no one will know if that’s due to AI. One lab will offer at least one AI researcher a billion dollars; he will turn it down, at first. Attendees enjoying organic British ballotine at President Donald Trump’s unprecedented second U.K. state banquet (yes, dear time traveler, you heard that right) will include Demis Hassabis, Sam Altman, and Jensen Huang.

Your prescience won’t end there. In 2025, everyone will have a chatbot that passes the Turing test in their pocket. When one maker upgrades its bot to version five, there will be such backlash from users attached to version four’s personality that its makers will have to bring it back. “GPT-5 is wearing the skin of my dead friend,” as one devotee will put it. It turns out turning your life into an anime will have wider appeal, hinting at higher-fidelity world models to come. A third of American teens will be using AI not just for homework, but for companionship, with another third choosing AI over humans for serious conversations. You might want to watch the recent film Her (2013), starring Joaquin Phoenix and Scarlett Johansson. It’s set in 2025.

Think it through in 2015, and your view of 2025 will cover both AI’s precocious cognition and jagged diffusion; its hyperscale private investment and cautious public reception; its mundane ubiquity and niche fascination. In other words, I think we’ve actually ended up with both the robots and the debt.

With that humbling historical exercise behind us, I’ll turn to the slate of predictions before us of the decade ahead. This year offers many candidates to sift through. Will AI capabilities go super-exponential in the next few years, because it figures out how to improve itself better than humans can? Or is it more likely that humans remain in control through steady progress, with gradual rather than discontinuous transformation? Will automation drive gradual disempowerment, perhaps via a resource curse? Does centralizing AI entrench or undermine state power? Do models have a persona, or only the extrapolation of one? If anyone builds it, will everyone die, as one New York Times bestseller promises? There’s another, more pedestrian, category of predictions. Is AI a bubble? Does its funding and revenue bear favorable comparisons to past infrastructure build-outs, or is it one big ouroboros? Even if AI is a bubble, aren’t you still wrong if you call it too early? Another year of AI progress raises these questions. As Jack Clark told Rick Rubin in June, “everyone is pulled forward by a sense of inevitability.” The ultimate question: what is inevitable?

Nothing is truly inevitable, but there is one core driver of AI progress: the relentless compounding of computational resources, nominalized “compute.” Compute is the biggest story in AI this year, the star of the infrastructure show. Many people have written about it before, from scaling laws to The Bitter Lesson to Moore’s Law. But the modern concept of compute has antecedents with a broader meaning and a longer history than today’s framing admits. Compute is somehow still so underrated that it will not tax you exceedingly to hear one more take on it.

By now, anyone who can be convinced that AI will be a big deal by being shown a graph has already been shown that graph and is on board. So let me try a different approach. I will tell you the story of how I came to believe in the compute theory of everything.

I wasn’t someone who predicted, through sheer force of reason long ago, that AI would be a big deal. I muddled into it.

Begin again in 2015. I was in high school and wanted to do a science fair project, and the way Arizona science fairs worked those days was that you had to do biology. That way, you could show up at Arizona State University, befriend a professor, and obtain mentorship, a lab, and work to build on. This all looked very professional. It also looked like a lot of competition to me, in biology of all places (I thought it was a major downside that life wasn’t deterministic). I’d just learned to code, so I entered a project that implemented some basic statistics looking at solar flares, and that did just fine.

The tricky thing about science fair is that you need a new project every year. (This isn’t very scalable! I must have thought.) So 2016 rolled around, and I found myself hunting for a project in the same shape. That summer I had gone on a field trip to Apache Point Observatory, home to telescopes whose purpose was to survey the entire sky. By 2016 “big data” was already a stale buzzword, with grants invoking it long since written, approved, and executed, and the big data in question sitting on a server. I wanted to use that data, and was also mulling over how. The previous year, a judge gave me a particularly hard time about whether my method fit the physics. Wouldn’t it be great, I thought, if there was some general method that learned the right patterns from the data itself? And so I discovered “machine learning,” and read a 2012 paper titled “ImageNet Classification with Deep Convolutional Neural Networks” in the very late year of 2017.

This paper introduced a neural network called AlexNet, which won a contest to classify images of things like cats, cars, and faces. AlexNet won by a huge margin leveraging graphical processing units (GPUs), the same device that fills data centers today. Some computer vision scientists had used GPUs before, but AlexNet surprised everyone by splitting a very deep network across two GPUs, sprinkling in some state-of-the-art algorithmic techniques, and triumphing in a harder contest than prior work. Fei-Fei Li, who ran the contest and made its dataset, had originally planned to skip that year’s iteration, having just had a baby. She booked a last-minute flight when she saw AlexNet’s performance. Yann LeCun called it “an unequivocal turning point in the history of computer vision.”

When I read about AlexNet in 2017, five years later, I thought, it’s so over. I’m so late to this AI thing. Reading only AlexNet, I couldn’t see that AI progress was such a sure thing. AlexNet scaled compute, but it also had a huge dataset and a clever architecture in a well-studied problem. In 2017, with networks even better than AlexNet (including a “Very Deep” one), and everyone training models at home with TensorFlow 1.0, the field looked saturated. It must be back to hard human invention, I thought, which is contingent and precarious. But in fact I was wildly early. I was totally oblivious that the field was just laying the foundations of compute scaling, for 2017 brought Transformers, the architecture that still undergirds frontier models, and AlphaZero, which surpassed humans by playing only against itself and still motivates research to go beyond human data.

This year, many people’s first introduction to the case that AI will be a big deal was METR’s time horizon plot. It’s a trend that more or less says, newer AI can somewhat reliably do harder and more useful tasks, as measured by how long it would take humans to do the tasks. As of this writing, the best AI can, roughly half the time, do tasks that take humans over four hours; that number was nine minutes just two years ago. Extrapolating the trend, AI will soon do tasks that take humans weeks. People use the trend to justify why the world might change very soon, very suddenly.

In the field, METR’s time horizon plot must be the graph of the year. It summarizes AI progress across companies, across model types, across different tests, and across time, all on one measure of doing work that’s useful to humans. At the same time—and this is no fault of METR’s—it’s the worst plot we have to make the case for AI, other than all the other plots we have. My reaction to AlexNet, excited but not that excited, is a common one today. What AlexNet could do surprised me, just as people today must be surprised when they ask a chatbot to do research or code a website for the first time. I conceptually knew that AI had been and was still improving, just as you can today look at time horizons. But it’s hard to internalize a pattern from only one firsthand data point and a conceptual trend. You can be forgiven for not instantly thinking that AI will kill us all very soon.

As such, the time horizon trend is the perfect place to muddy the waters. I’m going to lay out, from the start, a bunch of valid reasons why you might not buy a trend like “the length of tasks AI can do is doubling every seven months,” and in general why you might not want to extrapolate any scaling trend very far into the future. Then, we’ll go into the details to rescue the thesis, where we’ll find other scaling trends, and complicate those, too. I’ll rinse and repeat until we get to the bottom of things.

So here’s some reasons to hesitate about time horizons. First, the tasks are mostly coding tasks (which may or may not represent all useful work well). Second, AI can only do these tasks somewhat reliably (which METR benchmarks at 50 and 80 percent reliability, which may or may not be reliable enough). Third, contractors define the human baseline (and they might be slow, which would throw the whole trend off). Finally, even if you totally buy the time horizon trend, it’s an empirical one. Experiments showed that newer AI did longer tasks, and that we could somewhat predict how much longer. But we don’t have a theory for why that is. The next AI might plateau there—and surely, the time horizon trend will stop one day. Without historical perspective, it wouldn’t be strange at all if it did.

With these valid doubts, we can happily go about our day without worry, for now. College rectified some of my oversights. Like many students, I didn’t know what I wanted to do with my life. I studied history, and in America, you can specialize in a subject by taking only a third of your courses in it, so I can’t say I was a good candidate for the conviction of graduate school. A Real Job, on the other hand, lacked the mysteries of research, which by then I knew I needed to keep life interesting. I say only half-jokingly that the AlphaGo documentary changed my life. What stuck with me was less the achievement and more the ethos of AI research. I still didn’t comprehend the trajectory the field was taking, but I sensed a wry overreach in trying to solve everything, and a curiosity unburdened by corporate stodginess or academic schlep. Best of all, everyone just looked like they were having a ton of fun.

AI progress continued, which certainly didn’t lead me to think otherwise. By the time I graduated in 2021, I had heard of GPT-3 without grasping its significance; I had done a double-take at how faithfully AI could generate pictures from language (“an armchair in the shape of an avocado,” from DALL-E); I even knew vaguely, from a reddit comment, that Transformers were state-of-the-art in everything from my favorite strategy game StarCraft II to protein folding, solved decades earlier than experts had predicted.

Again I thought, it’s so over. I’m so late to this AI thing. They’ve scaled up the compute again, but they’ve also got hard-earned data like from the Protein Data Bank, and special architectures, like recycling a model’s predictions back into itself. The field is surely mature now, and it’s back to that contingent and precarious human invention. But in fact I was still wildly early. In January 2020 a paper established how AI model performance was a predictable function of compute for the Transformer architecture. More compute led to better performance, yes, but now researchers knew how to allocate that compute to more data or bigger models. These were the first modern scaling laws. Years from now, we might say in retrospect that this was the most important work from the time. But that’s getting ahead of the story.

According to the AI measurement non-profit Epoch, compute used to train AI models has grown four to five times per year for the last 15 years. Models today consume more mathematical operations than there are stars in the observable universe. The first scaling laws gave researchers the faith that scaling up the compute would be worth it. Not only could models predictably predict the entire internet with less error, they unpredictably attained qualitatively new emergent capabilities, from factuality to instruction following to logic. These scaling laws are where you end up, if you follow your curiosity after the first time AI surprises you. It’s one step deeper than the time horizon trend. While time horizons measure newer AI doing longer tasks, scaling laws measure AI models improving with more compute, and so newer AI models use more compute. Now the time horizon trend looks a bit stronger than at first glance—it’s supported by a more general trend, one that’s been going on for longer. If you extrapolate scaling laws, which might be a bit easier to stomach, then you might find it easier to extrapolate the length of tasks that AI can do.

Still, we have reason to hesitate yet. Scaling laws have continued for 15 years, but through three distinct regimes, as Epoch themselves have defined. Scaling was different before and after AlexNet, and before and after the original scaling laws. That means that scaling was no guaranteed thing between the eras—scaling chugged along well during an era, but would have flagged, had human innovation not come along and saved it into a newer, better regime. Without that contingent and precarious innovation, we probably wouldn’t have hit fourfold growth a year.

Even in the modern era, scaling laws are a loose term, and have been broken and repaired many times. Human ingenuity constantly improves pre-training scaling laws, without which model performance would plateau. Researchers have also found new scaling relationships in other parts of model training and inference, including how much to think before answering a question. Techniques like synthetic data, distillation, and mid-training complicate the picture further, since they account for compute differently. The more details we get, the shakier that 15 year trend looks—it’s got stubborn persistence, but it looks held together more by the eleventh-hour toil of a bunch of nerds than by the inexorable will of the universe.

And again, scaling laws are an empirical trend. Experiments test whether scaled-up models perform according to prediction, but we don’t have a theory for why that is. The next scale-up might fail, producing a model that underperforms on a test, as humans sometimes do. Surely, scaling laws will stop one day. If they do in the next few years, it’s had a good run, but it wouldn’t be strange if it did.

So we go on doubting. I joined DeepMind in September 2021, the day everyone started returning to the office after the pandemic. In my first year I worked on the reinforcement algorithms that succeeded AlphaZero, trying to make AI play against itself to learn without human data. The problem, harder than Go, was embodied simulations, where agents navigated rooms and interacted with toys. Imagine a playground with apples, buttons, and blocks. I wrote about that year fresh off the end of it.

The most impactful thing I did that first year was to multiply the compute of an experiment a thousandfold. A colleague suggested it, based on a paper that said one setting in our setup was way too low. I took on the moderate engineering to make it happen, and to everyone’s surprise, the scaled-up agent claimed the top spot on the leaderboard by some margin. Earlier attempts to scale had shown limited success, and nobody thought it was worth tying up that much compute for that long. Later analysis pointed at no single reason why the scaled-up agent was better. The agent just failed less across the board at charting mazes and wielding tools, mimicking the robustness of models the language team trained next door.

You could say that for a limited time, in a limited way, I became the unwitting owner of the world’s most powerful AI in embodied simulation (vision language models were just being invented, so there wasn’t much competition outside DeepMind). But I only later recognized the lesson for what it was. At the time, it was hard to disentangle the effect of scaling up. Credit where credit was due, theory inspired the experiment, and none of data, algorithm, or evaluation proved a bottleneck, which would’ve plateaued performance, which would’ve changed the story. Indeed, other agents that improved on those factors while also using more compute soon surpassed mine. Embodied simulation was more like language modeling than I’d thought, but it was hard to generalize beyond that.

Then, lightning in a bottle. OpenAI debuted ChatGPT as 2022 ended, with what fantastic consequences we need not here relay. GPT-4 followed four months later, passing the bar exam and endowed with vision. The day GPT-4 came out, I was leaving for a long-ago planned Omani beach vacation. I spent the flight reading the technical report instead of relaxing. Cue it’s so over. I’m so late to the AI thing. With GPT-4, it really does look like the big idea is compute. But it was wildly early; AI had burst its academic niche, and commercialization was afoot.

Step down the ladder of abstraction once more. Beneath time horizons and scaling laws is Moore’s Law. Moore’s Law, of course, is the observation that the number of transistors on a microchip has been doubling every two years, from a few thousand in the 1970s to tens of billions in the 2020s. It’s why I could buy an Intel Xeon processor on the internet for five dollars, a chip a million times better than the Apollo 11 guidance computer, and it’s sitting on my bookshelf as a tchotchke. Moore’s Law delivered instant communication, frictionless payments, and hyper-personal entertainment, all emergent capabilities no one could’ve predicted.

Moore’s Law, too, resists easy extrapolation. It’s weathered regime changes just like scaling laws, from purely shrinking transistors to running multiple cores in parallel, as Dennard scaling ended in the aughts. By its original definition, Moore’s Law was dead long ago. If you’ll allow me a more heretical redefinition, from transistors to total system compute, from the chip to the cluster, Moore’s Law is alive and well in today’s multi-rack data centers where power and interconnect are the bottleneck. This too gives us pause if Moore’s Law is really so strong a trend—every doubling, which looks relentless and smooth writ large, masks immense human ingenuity.

Whether Moore’s Law continues by its traditional definition, maybe with next-generation High-NA EUV lithography, or by my heretical one, merely squeezing more operations out of a concrete building we’ve defined as one computer, it’s an empirical trend. Luckily for all of us, it’s one that’s endured for over half a century, outliving many of its pallbearers. But we don’t have a theory for why that is. Moore’s Law could stop tomorrow—it surely will stop one day—but with its history it would be a little strange if it did.

We’ve hit some trends in the middle strata, not topsoil, but not bedrock either. Moore’s Law is a trend that has soundly earned its keep. It’s got many complications, many valid reasons why we might not believe that it will continue. But we have more of a grudging confidence that it might continue at least one push longer, carrying scaling laws forward a notch and, in turn, time horizons longer a hair. Moore’s Law has stubbornly persisted for so long that we start to wonder what gives it its longevity.

There’s another trend in this middle strata, an AI counterpart to Moore’s Law. In 2019, Turing Award winner Richard Sutton penned a short blog post that everyone in AI has read. “The Bitter Lesson,” Sutton proclaimed, was that “general methods that leverage computation are ultimately the most effective, and by a large margin.” He cites games, speech, and vision as domains where compute has eclipsed human expertise. Humans lose to chess bots, live translation works, and computers generate pictures—all with almost the same method, it turns out, so he’s not wrong. The lesson is bitter because we still haven’t learned it. While researchers might get short-term wins, they eventually lose to compute, for as long as AI has existed as a field.

I don’t mind repeating Sutton throughout this letter because he wasn’t even the first to say it. This year I had many edifying conversations about the unreasonable effectiveness of compute with my colleague Samuel Albanie, who alerted me to a prescient 1976 paper by Hans Moravec. Moravec is better known for observing that what’s hard for robots is easy for humans, and vice versa, but in a note titled “Bombast,” he marveled:

The enormous shortage of ability to compute is distorting our work, creating problems where there are none, making others impossibly difficult, and generally causing effort to be misdirected. Shouldn’t this view be more widespread, if it is as obvious as I claim?

Imagine complaining, in 1976, that we’re all just making up problems for ourselves, because we’re all ignoring our real bottleneck, which is compute. And then to demand, is this not obvious to everyone? 1976! Moravec would predict that economically viable computers matching the human brain would appear in the mid-2020s. Many dismiss technological determinism as a bad word. But I wonder, as Allan Dafoe anatomized it, if in some limited ways it’s not more useful to think of compute as more determined than not.

And so I started to seriously consider my pattern of underestimating AI progress. The more details I learned, the more valid doubts I had about extrapolating any kind of trend. But that just made the fact that the trends defied expectations all the more impressive. In the summer of 2024 I attended the International Conference on Robotics and Automation. I went partly because embodied simulation is related to robots, but mostly because I wanted to go to Japan. Going was vindication anyway for attending conferences farther afield. I heard the same story over and over again: there once was a robotics benchmark that classical methods spent years clawing to 20 percent. Then a big bad language model huffed, puffed, and hit 50 percent zero-shot.

I had heard this story too many times not to really test it myself. So compare the following two papers from DeepMind. The first was published in 2023, and the second earlier this month. Both trained embodied agents in simulation, and both extended pre-trained models, that is, models already trained on general data from the web. The first used models on the order of billions of parameters (Phenaki and SPARC, comparable to CLIP). The second used Gemini, a model in a class that industry observers commonly estimate at trillions of parameters. That’s a very crude estimate to spiritually gesture at a thousandfold increase in compute.

The papers have graphs showing how much better version two is compared to version one. But they don’t convey the effect that seeing the agent act had on me. Late one evening, I had just finished the last bit of infrastructure connecting version two to our diagnostic test, and I was the first to see it act. I thought there must be a bug. The agent executed every task nearly perfectly, identifying tools with intention and picking them up gracefully. The agents you befriend after years training them are jerky, stiff, and nauseating. This one was smooth, even natural. Immediately I knew there could not possibly have been any mistake, because even if I had cheated, and stuck myself behind the controls, I would not have been so adept. I went home in a daze, hoping I wouldn’t get hit by a bus.

This is what it feels like for the wave of compute to pass over you. When you meet an AI researcher who thinks AI is going to be a big deal, I bet she is thinking of a similar shock, on another problem compute has had the fortune to befall. When you meet a capital allocator with a glint in his eye, I bet this is what he augurs in the tea leaves, as he signs another billion dollars to Jensen Huang.

However hard I try, I don’t think my descriptions will even move you much. You need to pick your own test. It should be a problem you know well, one you’ve worked on for years, one you’re supposed to be an expert in. You need to predict the result, year after year. Then, you need to watch the AI confound those predictions anyway.

Humans are supposed to be clever. This impudent “large” model, this jumped-up matrix multiplication, it can’t possibly understand all the dearly-won nuances at the state-of-the-art. There are confounders that are supposed to stop it, like data and algorithms and other bottlenecks. It shouldn’t be this easy. It would make more sense if one of those reasons for why scaling is supposed to break had kicked in by now.

Only then will you recall the empirical trends you conceptually know, and finally be willing to extrapolate them. Later you’ll take AI for granted, just like we’ve all taken for granted that chatbots can rhyme and cars drive themselves. Later you’ll think, who could possibly compete, how could your cleverness be worth anything more than a hill of beans, against an artifact that cost millions and concentrates within it the cleverness of billions of humans? How arrogant to think yourself clever enough to outpace a factor of a thousand, then another thousand, then another thousand. This is what they mean when they say general purpose technology.

To feel the AGI, as the saying goes, think of the pandemic. This analogy pains me, because pandemics are straight-up bad, whereas I believe AI will be very good. But it suits for a couple reasons. The pandemic is something actually happening, of moving history. I grew up at a time where nothing ever happened. There’s even a meme about it. Nothing Ever Happens is the End of History of my generation, lamenting the lack of happening in everything from dedollarization to decoupling to the Arab Spring to pivots to Asia to the metaverse. The pandemic’s arc is also instructive. That it would be a big deal was public information long before it was physically obvious. In late February, a week before the market began to crash, and weeks before its lowest point, Centers for Disease Control director Robert Redfield admitted that coronavirus was “probably with us beyond this season, beyond this year.” I remember leaving college for ostensibly a long spring break. Friends embraced with “see you in a month!” Only a few understood that we weren’t coming back. You could conceptually know what was happening in Wuhan, in Northern Italy, but it was hard to face. In a few weeks everything changed, until the whole world was in the same story.

AI is also moving history. That it will be a big deal will also be public information long before it is physically felt. It will reach some places before others. There was a day, in the past now, where the wave passed over high school homework, and completely upended that world. This year, the wave passed over software engineering. It’s just brushed research mathematics and molecular biology. I’ve got some anecdata, too, in two conversations I couldn’t have imagined before this year: a friend at a consulting firm got a new batch of hapless interns. His boss’s advice? “You gotta treat them like they’re Perplexity Pro bots.” In another instance, a friend confided in me over a matchmaking attempt gone awry. Could she have seen it coming? I asked. She considered this. “No. But if I were o3 Pro, I could’ve.” Nowhere, then everywhere.

Just for completion, I want to take this scaling ladder to its end. Take another step down, below time horizons, scaling laws, the Bitter Lesson, and Moore’s Law, to what drives computation itself.

One way to think about computation before computers is through the work of Herbert Simon, the only person to win both the Turing Award and the Nobel Prize in economics. He explained that firms, like computers, are information processing systems. Firms have hierarchy, limited cognitive power, and bounded rationality. Firms also have learning curves, where unit costs fall predictably with cumulative production. In a way, humans organized themselves to be better information processors long before computers. We might extrapolate the trend before Moore’s Law to 400 years ago, with the invention of the joint stock corporation.

Why stop there? In the Netflix adaptation of the Chinese science fiction novel The Three Body Problem, an alien explains why her species has to stop human science. She guides the protagonist through history: it took homo sapiens 90 thousand years to discover agriculture; 10 thousand years to go from agriculture to the Industrial Revolution; 200 years from the Industrial Revolution to the Atomic Age; 50 years from the Atomic Age to the Information Age. Human ingenuity is nothing if not awesome.

In for a penny, in for a pound. Ray Kurzweil, who popularized Vernor Vinge’s idea of the singularity, and I. J. Good’s idea of the intelligence explosion, has one big trend billions of years long in his book The Singularity Is Near. In his telling, the transition from single to multi-cell organisms, the Cambrian explosion, walking upright, agriculture, and printing, all just represent the increasing organization of information. Kurzweil partly inspired DeepMind’s founding in 2010. You might not come along for the whole ride, to Kurzweil’s end, but you might see its allure. The bard of our time put it well in March: everything’s computer.

At the end of this Matryoshka doll of scaling curves is progress itself. Progress is a smooth trend that obscures jagged details. The historical record attests to thousands of years of stasis, before hundreds of years of growth. This smooth trend belies countless details of contingent and precarious human ingenuity, and countless confounders like the data and the algorithms that make it exist. Progress is certainly empirical. There’s no reason it had to be this way, but now we can hint at one—progress improves on itself. We have so many good reasons to hesitate extrapolating progress, why every time we’ve made the jump to a new regime of progress in the past, it was industry and accident and not providence.

The biggest mistake people make when they make the case for AI is that they say it’s different this time. It’s not different this time because it’s always been different. There hasn’t been any constant normal trend ever, and all we’ve ever done is be optimistic that we’ll muddle through. Nothing is truly inevitable, certainly not progress. And progress, too, might stop tomorrow. All things considered, though, it would be stranger if it did than if it didn’t. I don’t want to have that hesitation, anyway.

Since 2024, I’ve worked on computer use agents, which I wrote about in last year’s letter. Part of it has to do with making each model improve the next; another, slightly less fun, part is doing data compliance. Compute has remained a generous guide, to which I’ll make some general points. Zeynep Tufekci once tweeted a useful rubric: “Until there is substantial and repeated evidence otherwise, assume counterintuitive findings to be false, and second-order effects to be dwarfed by first-order ones in magnitude.” Zeynep’s Law is the second useful framework from the pandemic I’ll bring back.

First-order effects first. Compute compounds, and its results confound. The models are more general than ever before. From those that generate images to those that win Olympiads to those that browse the web, they all share the same base. Even if you think frontier models are just duct-taped together memorization and automatisms, that chimera has uncanny overlap with what humans find useful. And, the models are more efficient than ever. Compute efficiency measures how much compute it takes for a model to achieve the same performance. There are many indications of how much labs focus on this. In January, the market’s negative reaction to DeepSeek R1 compelled Dario Amodei to reveal that DeepSeek was “not anywhere near” as efficient as everyone assumed. Sam Altman, too, calls the falling price of intelligence the most underestimated trend in recent years. Google DeepMind explicitly tries to make each Flash model exceed the Pro model of the previous generation.

The models still have so much headroom. I feel vindicated in what I wrote last year, that the models can achieve any evaluation—Humanity’s Last Exam, FrontierMath, ARC-AGI have all seen precipitous gains. So in August, I wrote with Séb Krier that we need better ones. Despite this, when GPT-5 came out, there were really very many diagnoses of the end of AI progress, maybe because it’s harder for humans to see advances anymore. But how do I emphasize that we’ve literally just gotten started? ChatGPT turned three this year. This year was a new step in revenue, investment, and talent (I saw a mass exodus from Jane Street to Anthropic). In pre-training, Oriol Vinyals observes “no walls in sight.” In post-training, Gemini tells me that DeepSeek’s latest model, which generated thousands of synthetic environments, “suggests we are moving to a new era of Compute Scaling (to make models “think” about data).” Epoch estimates that scaling is unlikely to bottleneck in the next few years, not even due to power. Finally, Peter Wildeford makes a great point: right now, the world doesn’t even have any operational 1GW data centers. If there’s a bubble, it hasn’t even started yet.

The first-order effect should be staggering. Whatever you mean by AGI, it may not be far. Even Gary Marcus, known for his skepticism, thinks whatever it is is 8 to 15 years away. A recent longitudinal survey of the public, experts, and superforecasters puts the median AI at “technology of the century”—the most important technology of our lifetimes. What does it matter if it’s two or 20 years away? The only change that’s mattered to my AI timelines is that I used to think it wasn’t going to happen in my lifetime, and now I think it is.

The second-order effects are murkier. Here are a few ideas I’ve heard: AI will be powerful, so we need to race China—so we need to pause. AI will be economically transformative, so we need universal basic income—so we should accelerate human obsolescence. AI will be entertaining, so we should give people the slop and porn they want—so corporations should be paternalistic about what people can handle. I’m not even saying I disagree with any of these ideas, just that they need development.

In May, I participated in an AI 2027 tabletop exercise. I don’t agree with all their assumptions, like how easily humans would relinquish control to AI. But the point of a tabletop is to accept the premises and see what follows. As “U.S. adversaries,” I cribbed the gray zone tactics depicted in a TV trailer and pulled off a totally bloodless reunification of China. But it turns out that if AI does take off, it doesn’t matter what chips you produce. What matters is global compute share at the moment of takeoff. That was useful to learn. In November, I was at a small economics workshop. Someone else there had gone around surveying London City firms about AI for the U.K. government. “Do you think it’s real?” he would ask them. “We don’t know,” was their answer. “Just don’t regulate us.” Because other countries would not, and they didn’t want their hands tied. A friend relayed a third surreal story from a committee meeting in D.C. A Congressman gave a sort of blessing at the end: “I pray that we may find success in alignment and alternative interpretability methods.”

The demand for clear second-order thinking will only increase. Helen Toner assessed that the U.S. national security blob currently only sees AI as “one small manifestation” of competition with China. As it once grew out of its academic niche, AI may yet outgrow its commercial primacy. Joe Weisenthal predicts AI will be a bigger issue in 2028. People hate high electricity prices, find AI not all that useful, worry about their jobs, see the rich get richer, and distrust tech companies. In broader culture, AI psychosis is claiming more, and more improbable, victims. The kids are decamping the middle, Kyla Scanlon observes, for stable tradecraft or make-or-break gambles. I asked Tyler Cowen, in July, why you need to “just shake people a bit” about AI. It’s inconvenient, it’s going to be hard, and we don’t want to think about it. But there’s no stable option on the table.

In Zeynep’s Law, second-order effects are dwarfed by the first-order ones in magnitude until there is substantial and repeated evidence otherwise. I’ve done my best to give substantial and repeated evidence that AI will be a big deal. But there is far to go in making that concrete. So far, second-order thinking about AI has mostly been done by AI people venturing far afield. It needs development from experts in politics, economics, and culture who take it seriously. This is what thinking it through looks like. Jonathan Malesic in May reminded us that AI cannot teach us how we want to live. He writes of the humanities:

I will sacrifice some length of my days to add depth to another person’s experience of the rest of theirs. Many did this for me. The work is slow. Its results often go unseen for years. But it is no gimmick.

In August, Harvey Lederman contemplated his role as a philosopher in a world where AI can do anything better than he can. He feels lucky to live in a time where he has a purpose, and dread at the thought of being “some of the last to enjoy this brief spell, before all exploration, all discovery, is done by fully automated sleds.” Games will be at both the beginning and the end of AI. For us personally, Jasha Sohl-Dickstein posted in September a great talk on “Advice for a (young) investigator in the first and last days of the Anthropocene.” He takes seriously the idea that AI will be a big deal, and considers what a researcher should do about it. You have immense leverage; prefer fast projects, be robust to the Bitter Lesson; do something you’ll be proud of.

What else are you going to do, if not what you want? Skills are depreciating as fast as the GPUs that automate them. Maybe, like Sergey Brin, you thought it through to the end, and realized that the most fulfilling thing to do is one big group project with friends. For my part, I’m joining DeepMind’s new Post-AGI team.

This is where I get on the train—we’re so wildly early.

I thank Arjun Ramani, Jasmine Sun, Radha Shukla, and Samuel Albanie for discussing these ideas with me and reading drafts. Thanks also to a Google DeepMind colloquium and Denning House, where I tried out some of this material. The cover painting is The Rising Squall, Hot Wells, from St Vincent’s Rock, Bristol by Joseph Mallord William Turner, R.A.

spean

The Great Glen Way near Spean Bridge, Scotland.

Other than London and Phoenix, this year I spent time in Rome, Zermatt, Zurich, Brighton, Santiago, Washington D.C., the San Francisco Bay Area, New York, New Haven, Kuala Lumpur, Singapore, Paris, Marrakech, Casablanca, Midhurst, Tunis, Fort William, New Delhi, Bangalore, Bagdogra, Chippenham, Portsmouth, Salzburg, Munich, Oxford, Shanghai, Beijing, Hong Kong, and Sedona.

I happened to be in Oxford on Frodo and Bilbo Baggins’s birthday, so I went to see J.R.R. Tolkien. Wolvercote Cemetery is cramped, but green and well-tended. On the autumnal equinox the Sun already sits low in England by late afternoon, and the headstones caught the slanted light. Isaiah Berlin and Aline de Gunzbourg also rested nearby in the Jewish section, but I didn’t know exactly where. I asked someone who looked like a groundskeeper. “I have no clue who you’re talking about,” he said.

Once upon a time Isaiah Berlin was more famous, and I’d like to do my part to restore that situation. I read one biography I really like per year, and this year’s was Isaiah Berlin: A Life by Michael Ignatieff. It’s unmistakably a biography of the old school, where the author, himself a flagbearer of the liberal cause, saw no need to feign impartiality and freely offered his own editorials on Berlin, whom he knew well.

Berlin was a champion of pluralism above all, if such a qualifier doesn’t contradict itself. Value pluralism holds that the highest human values, like liberty, equality, justice, and mercy, are incommensurable; we can’t order their importance in one perfect system. That doesn’t mean that anything goes—kindness still beats concentration camps—just that beyond a common horizon, values are impossible to adjudicate between. Berlin came to mind because as AI progress continued, so too have some of its decidedly monist philosophies risen in prominence. In July, I wrote some short fiction after the hedgehog and the fox, to imagine two ways the same evidence could go.

Now I think I can contort my whole year into three, loosely Berlinian, themes. First, pluralism and its useful contradictions: the ability to hold two opposing ideas in mind at the same time, a thought I got from John Gaddis who got it from Charles Hill who got it from F. Scott Fitzgerald. Second, marginalia: the footnotes, the minor characters behind the main characters. From Star Wars: Andor to the various government officials I ran into all around the world and in stories I read, this letter is dedicated to you. Pluralism’s lofty values stand because you bear the weight. Finally, third, unreasonable optimism: small rebellions of agency against vast and indifferent deterministic forces. It’s how the marginalia shines through.

Nowhere did I get all three more than India. I went on a big group trip this August, which Sparsh, Arjun, and Arjun organized. Follow what Sparsh posted every day of the trip, and read what David Oks and Jason Zhao wrote afterwards. We met some people leading the world’s largest democracy, encountered entrepreneurs urban and rural, not least through Sparsh’s company Alt Carbon, ate some incredible food, and hiked the beautiful hills of Darjeeling. At the risk of over-generalizing, I found India higher in guanxi but lower in mianzi compared to China. Our organizers’ connections got us time with people who really shouldn’t have had any. Though people were keenly attuned to status, everyone everywhere was more willing to flout convention, or had to, to get things done.

Shruthi introduced me to the concept of neti, basically, “not that.” It could be a slogan for India’s third way—unlike America, unlike China, not this, not that. I’m rooting for them. Unlike Europe’s third way, which seems to be to regulate, India’s third way wants to build in public. Halfway through the trip, we had a wonderful conversation with Pramod Varma, the architect of India’s Unified Payments Interface (UPI). It would be no exaggeration to say that UPI, which processes nearly half of the world’s real-time digital payments, has done more for financial inclusion than any technology in history. Most of us, from the West, pressed him late into the night on why the private sector couldn’t do what UPI did. Maybe there are just idiosyncratic things about India, and maybe the private sector could’ve done it. But the fact is UPI is a huge success, which, because I didn’t learn how to replicate it, I can only credit to the architect’s boundless optimism.

The next morning we met Manu Chopra in his bright and leafy office, where his company collects data from underrepresented languages and demographics. He told us a story about one of his contributors, who could not help but notice the reversal; while he once paid to learn English, now he was paid to teach AI his mother tongue. Again, if there is an expedient business case, I am not the one who can make it. As far as I can tell, Chopra, a Stanford graduate, and many others like him, could’ve founded a company in America to much greater conventional success. Instead he chose India.

I savored other thrills. One of us navigated four of us to the wrong franchise of the Bangalore hotel we were staying at, an eternity of traffic away from the right one. After half an hour of failed Uber rendezvous, we made an executive decision to take one rickshaw, all four of us. The driver didn’t bat an eye, and only up-charged us 100 rupees. So for 40 minutes we careened through Bangalore’s neon nightscape, four wedged on a bench meant for two, one in another’s lap, half hanging out into the open air. When the rickshaw climbed the flyovers, you could feel the motor heaving.

A journalist recounted how the Singaporean foreign minister once proudly sent Singaporean students to India, because, “In Singapore if the train says it will arrive at 9:02, then it will arrive at 9:02. In India every day is unpredictable, and they need to learn that.” Sometimes, though, I could do with more warning ahead of time. Indian airports dropped to my least favorite in the world, through no fault of their amenities. It’s just that they’ll x-ray you and your bags at least three times from curb to jetway. And, airlines will sell you connecting flights with a seven-hour layover, which is fine—but should you have the misfortune to follow the staff’s instructions upon deplaning for your connection, they’ll force you to wait in the departure hall, outside security.

My reading always improves in transit, though. A new conspiracy I believe in: Lord Macartney did kowtow to Qianlong, despite his insistence that he negotiated a different ceremony. You can tell from Thomas Staunton’s diary. And beneath all the robes, who can really tell how many prostrations Macartney did? When I arrived at immigration in Shanghai the officer was training a new recruit. She went over with her protégé, using my passport, how you have to match the name, photo, visa, and gender. There was this one time, she whispered, a man came but his passport said woman. They didn’t let him in. After immigration, the lady who sold me a Chinese SIM asked me if I needed a VPN. When I told her I already had one, she did me the kindness of checking which provider it was, to make sure it worked after I left the airport.

While America worries about AI deepfakes, my aunt has changed the voice of her chatbot (Doubao, or “steamed bun with bean paste,” ByteDance’s flagship model series) to her daughter’s voice. So I guess she’s totally immune to any scams, because anytime she wants to know something, and she uses AI a lot, my cousin is the specter explaining it to her. My grandfather, sadly, is more susceptible to scams. A caller claimed to have a rare stash of porcelain from Jingdezhen, and defrauded him out of quite some yuan. My aunt finally convinced my grandfather the bowls were fake when she got him to read the tiny etchings on the rim: “microwave safe.”

The engineering state is in good shape. I ordered takeout after boarding the bullet train in Shanghai, and got duck from Nanjing and pork from Jinan delivered to my seat before I reached Beijing. The train attendants walked up and down the aisle with trays of Starbucks lattes at each major stop. As Jasmine Sun quoted Charles Yang, they’ve got the juice. I was visiting over China’s October holiday, when mooncakes change hands before the Mid-Autumn Festival. I was delighted to find a pack of China Media Group mooncakes, branded not just on the extravagant packaging but right on the cake itself, with iconic central television buildings like the Big Underpants. “Oh, you have no idea how much corruption goes into those,” one of my uncles told me. Apparently state mooncake contracts are a lucrative vehicle for graft.

China is, on the whole, less spontaneous than it used to be. My Dad told me a story of how during a class trip in college, he and a bunch of his friends had to queue for 36 hours to buy train tickets for the whole class. One person could only buy four. They set up in front of the booth even before a bunch of old ladies who made a living as ticket scalpers, who brought their own stools and chatted away the day before sales opened. A different group of shiftier and more professional-looking scalpers later came by and tried to cut the line—but then decided they didn’t want to mess with the couple dozen delirious, military-age men that was my Dad’s group. The police came to check the line every few hours. Late in the game, one of my Dad’s friends subbed out with another guy to sleep. They were banking on the fact that all their IDs, with photos from high school, looked the same. The police quizzed him on his birthday though, and got him that way.

My parents recalled that in the past, people used to get dates on trains. Travel home for the holidays took the good part of the day, nothing like the quick hours a bullet train cleaves through now. And people used to buy more standing tickets; there was nothing like the neat serried rows of seats, so you’d get to know your neighbors pretty well. A lady who thought my Dad was pretty good at helping with the bags introduced him to a relative when he was in college. They didn’t work out in the end, but that sort of thing just doesn’t happen anymore. There are more modern means. One of my cousins just got a girlfriend by hanging a sign in the Didi he drove: “got a house; got a car; got no money.”

In other travel highlights, I highly recommend visiting what was once the tallest minaret in the world. I also walked by the oldest currently operating restaurant in the world, but am not sure if I can recommend it, as I had already had too much schnitzel to give it a fair try. The entire city of Rome is one big transit gap. Almost anywhere point to point takes as long to walk as it does to take public transport. If you want to see Rome, I say go to Carthage. Tunis not only has the largest museum of Roman mosaics I have ever seen, but lets you walk among the ruins including the Antonine Baths, which is actually much more intimate and impressive than the Forum in Rome. Don’t ask me what it means for preservation. The two cities also have a vegetable gap, one I judge in Tunis’s favor. In Tunis we also suborned our way into a communal choral concert, convincing a guard in three languages to smuggle out some spent tickets. Should you ever have the pleasure, you’ll find that A is one of the most resourceful and enlivening travel companions imaginable.

In the Scottish highlands, one cafe in Invermoriston (population 264) had no business being as good as it was. As of this writing, with 1,007 Google reviews, it has a rating of 4.9. Glen Rowan Cafe wins my establishment of the year. They sell six cakes, all homemade. Of all the ventures I’ve beheld, none stand so defiantly against the impersonal forces of optimization as this improbable gem. Contrast the Wee Coffee Company ‘Coffee-in-a-bag’ sold on the Caledonian Express. In the morning, as the countryside swept past the windows of the cafe car, I overheard one passenger volunteer to the waitress, “Yeah, these aren’t brilliant.” “No,” she shot back. “They’re not very good at all.” You must read these in thick Scottish accents.

I have some travel advice. You pack too much. Make everything USB-C, and buy travel duplicates so you don’t have to think about packing. Get a Cotopaxi (h/t H) that fits under the seat in front of you. It’s glorious to go a long weekend out of just a backpack (I’m told this is only possible as a man, which is nonsense). Get Flighty, the most exciting app on my phone. You should board last, but I get anxious looking up all the time, so I just get in line early with a book. Don’t read most museum placards, which are pro forma or mealy-mouthed. Some audio guides are the same, while others are really good, so it’s worth taking a risk. The best option is to read something about the place and then go consume the real thing, the more specific the better. The material culture and maps in the Hong Kong Museum of Art came to life having read Stephen Platt’s Imperial Twilight.

You will internalize some basic facts, while others will remain totally inaccessible. I learned that Malaysia is ethnically 50 percent Malay (and required to be Muslim), 20 percent Chinese, 10 percent other indigenous, and 7 percent Indian. But only by going can you even begin to picture what such a society would look like. I also learned that a popular pastime in hot and humid cities is to hang out in huge, air-conditioned malls. You might go, but you won’t be there for very long, and so you won’t understand this mall as your default lifelong social venue.

As such, you can take a barbell strategy to travel. Either go for a year, or two, or ten, however long you need to go to open a bank account. Or, stay no longer than a few days in the same place, and come back often. The world is changing, and Tyler Cowen is right, there will be combinations of place and time lost to us forever. I feel lucky to have visited Beijing almost twenty times in my life, seeing exactly what a tenfold increase in GDP per capita looks like. America, China, and India change enough to visit every year. Many short trips also aligns perfectly with my advice to pack lighter, by the way. Just learn how to use flights well. It’s the perfect place to do deep work or reflect. Have a free drink to hit the Ballmer Peak. And jet lag isn’t real. Or, if you always have jet lag, you never have jet lag.

Travel is the best teacher of pluralism, marginalia, and optimism. This year, the comforts of home are just as good though, thanks to the genius of Tony Gilroy. I began the personal section of my first annual letter talking about Star Wars: Andor, and I’m glad to be back. Seriously, Andor has saved Star Wars. It’s incredible how low the standard is otherwise for major characters recurring. How many cameo fights has Darth Maul had by now? Is no one at Lucas Ranch doing any gatekeeping? Many spoilers follow.

Andor is nothing if not marginalia. From the opening crawl of A New Hope, you know that the rebels have beaten the empire for the first time. They’ve got the Death Star plans at great cost. Rogue One tells that story, one of unsung heroism and hope against hope. Cassian Andor isn’t even the main protagonist of Rogue One. So Andor is a footnote of a footnote, and I don’t want to stop there. Give me a whole show about the paperwork required to host a director-level secret conference in the mountains about an energy program. Give me a book about deep substrate foliated kalkite. Synthetic kalkite. Kalkite alternatives. Kalkite substitutes.

Cass Sunstein once wrote that Star Wars is anti-determinist. Luke’s choice defies his bloodline, as Vader’s defies his past—“always in motion is the future.” Gilroy clearly supports elevating individual agency. Nemik’s manifesto is the most explicit—tyranny requires constant acquiescence. Axis is Kleya, not Luthen. Kleya and Cassian spar over exactly how much collective necessity is appropriate, in the rebellion’s ultimate quest to restore individual agency. Even minute ambiguities, like if Perrin knew Mon was a rebel the whole time, slouched drunk in the back seat of a space limo, make you wonder what strength he had to summon.

Then there’s the opposite view, that others have pointed out. Cassian tries to avoid joining the rebellion so many times, and joins anyway. In this season he tries to leave so many times, and walks straight into Rogue One anyway. Bix and the Force healer tell him he does have a destiny. The counterpoint of Nemik’s account of empire, Andor’s rebellion is the sum of ten thousand genuine choices. Elsewhere, Gilroy makes us wonder what acts are truly necessary. Did Tay have to die, did Lonnie have to die, and did Luthen have to walk before Yavin could run?

If I let myself reach freely, I’ll say that Andor is saying that destiny is real and consists of countless real choices, but not everything we think needs to be a cost actually needs to be. There are parallels with the compute trends I’ve just spent half the letter describing, too, as countless instances of ingenuity are why we have them, and they look like destiny, too, but people may be using those to justify too much. To not read too much into Andor though, just watch it to make you feel good about the people behind the heroes, who don’t get any glory but who can’t be replaced. Imagine the hope they must have had, never getting to know how the story would end.

I laughed out loud when Krennic said he had just met with Palpatine and said that’s why everybody had to listen to his orders. There have been a couple of meetings at Google where people go, in the exact same tone, hey guys, I just talked to Sundar, so that’s why we have to do this. The little cakes and demitasse cups are characteristic of corporate events. Anyway, I highly recommend this show. I also loved this review: “Watching the end of Andor, then Rogue One, then A New Hope rocks, because the plot and the aesthetic are basically seamless, but everyone in the galaxy gets way stupider over the course of maybe a week.”

In other television about destiny, I enjoyed Shogun. I dismissed it last year as generic Westerner-saves-samurai fare, but then I read somewhere that it had won more Emmys for a single season of television than any other. That’s pretty good, especially for a show with subtitles. Then, as Hiroyuki Sanada’s newest fan, I enjoyed Bullet Train, which is really unfairly maligned. It’s also about destiny.

More in marginalia, there was The Lychee Road (h/t A) about a minor Tang dynasty bureaucrat who gets the impossible task of delivering fresh lychee 2,000 kilometers for the emperor’s favorite concubine. One great thing about it is that it’s a very colorful film, literally it has many colors in it, which is rare. In terms of plot, its modern parallels are shameless. Our protagonist toiled for 18 years for a chance to get a house with a crushing mortgage. He is shunted between different departments that all don’t want to take responsibility. In the end he is shattered; for one bowl of lychees to reach the capital, two hundred family heirloom trees must be cut down in the provinces. I started watching The Thick of It (h/t G, years ago) from January, and, stealing half an hour here and there, finished all of them last month. My favorite one was when Nicola Murray and Peter Mannion go at each other on BBC Radio 5, leading to an excellent fight between their handlers, who then abandon them. People should watch the (often better) British version of more things (like The Office, in this case Veep).

Of plays, I most enjoyed The Years, The Brightening Air, and The Weir (of content starring Brendan Gleeson, The Guard was the funniest movie I’ve seen this year). Of music, Spotify Wrapped tells me that my listening dropped in half, so I don’t have much to pass on there. Concrete Avalanche has continued being a great source. I listened to YaYa Remixes Repacked, 10 versions of a tune about repatriating a panda (do read the lyrics). “Red Spider” is also underrated. I luckily caught the very short run of Natasha, Pierre and the Great Comet of 1812 (h/t A), a musical that only covers 70 pages of War and Peace. It really makes obvious how dense a musical Hamilton is, as, in a similar runtime where Natasha makes a mistake and breaks an engagement, Alexander Hamilton has founded a global superpower and died.

The book I had the most fun reading was Jonathan Strange & Mr Norrell by Susanna Clarke (Piranesi is also excellent). Apparently Bloomsbury was so sure it would be a hit that they gave her a million pound advance. It’s an alternate history, one where magic didn’t disappear from England, which, as you know, happened around the 1400s. There is pluralism—at least two ways to bring back practical magic; there is marginalia—the novel has almost 200 footnotes telling you everything you might want to know about the history of English magic; and for sure there is optimism. One footnote tells the story of how, during the Peninsular War, at the end of a long day’s march, the army realized that the map was wrong. The Spanish town they were supposed to be at was actually much further down the road. Instead of continuing the march and fixing the map, Wellington had Jonathan Strange magically relocate the town on top of them.

Reading Clarke is excellent fodder for my continued passion for England. I really enjoyed The Rest is History’s telling of Horatio Nelson, half last fall and half this fall. Nelson was more into God, king, and country than Napoleon ever was. In counterpoint to Napoleon’s marshals, Nelson thought of Shakespeare, and felt like he had the best of the English as his companions, just like Henry V at Agincourt. I wonder what the equivalent is today. English sailors were the best-fed Britons in the world, and you got to learn astronomy, ropes, diplomacy, and logistics to boot. I didn’t know that both sides were so certain that the British would trounce the French and the Spanish before the Nile and Trafalgar despite having fewer ships and smaller ships. While their enemies could fire a broadside every four minutes, the English could do it every one.

Every letter, I sound like a big shill for the British. I need to put in a disclaimer that I just think that my first love, America, already has many vocal defenders, touting well-known reasons why America is great. Our begetter is a bit underrated at present, and they’re too circumspect to praise themselves. So we continue.

I want to make the pitch that London is the best Post-AGI city. It has every scene you could possibly want. Huawei spent billions to replicate the architecture, because being next to old buildings is good for creativity. Sam Kriss almost kissed the Heathrow runway when he got back from Burning Man. When all the cities in the world become museums because of AGI, where are you going to get your bona fide thousand-year history, your private corporation that runs the Square Mile under the 697th Lady Mayor? I hope they never change the rule where you must have line-of-sight to St. Paul’s, no matter what the YIMBYs say. And yes, we really do need all the other 20 of Sir Christopher Wren’s churches. Anachronism is that which is scarce.

London is also the best Pre-AGI city. Demis Hassabis insisted on DeepMind staying in London instead of moving to Silicon Valley because “this is not a fail-fast mission.” I agree. I’d draft the European gentleman-scholar archetype over the Bay Area founder-maxxer if I wanted a rigorous understanding of my training dynamics. It pains me to hear of AI researchers attaining generational wealth, only to immediately shoot themselves in the foot buying a house in the Bay Area. For all the talk of being contrarian, no one seems to be putting their money where their mouth is. That few million could’ve been a castle, with battlements, grounds, and a link to Robert the Bruce. Even Peter Thiel declared that Silicon Valley real estate, for what you get, “is probably the absolute worst in the world.”

I was deliberating with a friend, that maybe your earlier life is for not being led astray, and your later life is for making commitments. A city like New York or London is the best for the former, but bears revisiting for the latter. So, for the final section, about productivity, I want to start with what not to do. There was a recent meme, especially in AI, about 996. That schedule seems wrong to me. Work very hard, certainly, but even Sergey Brin pinpoints 60 hours as the sweet spot (976? 403?), and Sam Altman’s first advice is facing the right direction, if you don’t want to take it from me. The fact that Elon Musk exists as the single individual most responsible for rockets that land and electric cars being early and my Twitter feed getting worse, all in one person, is evidence that time is not anyone’s bottleneck. If it is, then the path you’re on is already determined, and not by you. As Berlin interprets Tolstoy:

There is a particularly vivid simile in which the great man is likened to the ram whom the shepherd is fattening for slaughter. Because the ram duly grows fatter, and perhaps is used as a bellwether for the rest of the flock, he may easily imagine that he is the leader of the flock, and that the other sheep go where they go solely in obedience to his will. He thinks this and the flock may think it too. Nevertheless the purpose of his selection is not the role he believes himself to play, but slaughter—a purpose conceived by beings whose aims neither he nor the other sheep can fathom. For Tolstoy Napoleon is just such a ram, and so to some degree is Alexander, and indeed all the great men of history.

So we return to Isaiah Berlin. I see a better path in his own life and pluralism. He was Russian and British, born in Riga but spending most of his time in Oxford before his death in 1997. Politically, he put himself at the rightward edge of the leftward tendency, and thought “intelligence and achievement could not redeem failures of will.” He was Jewish, but rejected both religious orthodoxy and complete assimilation. Berlin wedded history to philosophy to create his own field (presenting to Wittgenstein in his thirties may have planted some doubts about analytic philosophy). He rejected historical inevitability. His greatest contribution was synthesis, to explain the ideologies of the Cold War’s East and West. He was a fox who wanted to be a hedgehog.

In other words, you must be tranquil as a forest, but on fire within. In last year’s letter I went on about how you can only do one thing well at a time. Now I will tentatively venture out into two. I’ll admit there’s no way to adjudicate between values, and try to hold contradictory ideas in mind at the same time. Those ideas can be mundane. Work hard, but have slack for creativity. To be a good scientist you have to be both arrogant and humble. Care about everything, at the same time that nothing really matters.

One final piece of productivity advice. This year’s household capital goods recommendation is a shoehorn (h/t J). Get one at least two feet long and made of wood. Whenever I used to forget something in the house after I had my shoes on, I’d hop back in and try not to step anywhere. Now I take my shoes off because I’m so excited to use the shoehorn again.

darjeeling

Selim Hill Forest, Darjeeling.

Read the whole story
denubis
2 hours ago
reply
Share this story
Delete

Quoting Jason Gorman

1 Share

The hard part of computer programming isn't expressing what we want the machine to do in code. The hard part is turning human thinking -- with all its wooliness and ambiguity and contradictions -- into computational thinking that is logically precise and unambiguous, and that can then be expressed formally in the syntax of a programming language.

That was the hard part when programmers were punching holes in cards. It was the hard part when they were typing COBOL code. It was the hard part when they were bringing Visual Basic GUIs to life (presumably to track the killer's IP address). And it's the hard part when they're prompting language models to predict plausible-looking Python.

The hard part has always been – and likely will continue to be for many years to come – knowing exactly what to ask for.

Jason Gorman, The Future of Software Development Is Software Developers

Tags: ai-ethics, careers, generative-ai, ai, llms

Read the whole story
denubis
13 hours ago
reply
Share this story
Delete

My Christmas gift: telling you about PurpleMind, which brings CS theory to the YouTube masses

2 Shares

Merry Christmas, everyone! Ho3!

Here’s my beloved daughter baking chocolate chip cookies, which she’ll deliver tomorrow morning with our synagogue to firemen, EMTs, and others who need to work on Christmas Day. My role was limited to taste-testing.

While (I hope you’re sitting down for this) the Aaronson-Moshkovitzes are more of a latke/dreidel family, I grew up surrounded by Christmas and am a lifelong enjoyer of the decorations, the songs and movies (well, some of them), the message of universal goodwill, and even gingerbread and fruitcake.


Therefore, as a Christmas gift to my readers, I hereby present what I now regard as one of the great serendipitous “discoveries” in my career, alongside students like Paul Christiano and Ewin Tang who later became superstars.

Ever since I was a pimply teen, I dreamed of becoming the prophet who’d finally bring the glories of theoretical computer science to the masses—who’d do for that systematically under-sung field what Martin Gardner did for math, Carl Sagan for astronomy, Richard Dawkins for evolutionary biology, Douglas Hofstadter for consciousness and Gödel. Now, with my life half over, I’ve done … well, some in that direction, but vastly less than I’d dreamed.

A month ago, I learned that maybe I can rest easier. For a young man named Aaron Gostein is doing the work I wish I’d done—and he’s doing it using tools I don’t have, and so brilliantly that I could barely improve a pixel.

Aaron recently graduated from Carnegie Mellon, majoring in CS. He’s now moved back to Austin, TX, where he grew up, and where of course I now live as well. (Before anyone confuses our names: mine is Scott Aaronson, even though I’ve gotten hundreds of emails over the years calling me “Aaron.”)

Anyway, here in Austin, Aaron is producing a YouTube channel called PurpleMind. In starting this channel, Aaron was directly inspired by Grant Sanderson’s 3Blue1Brown—a math YouTube channel that I’ve also praised to the skies on this blog—but Aaron has chosen to focus on theoretical computer science.

I first encountered Aaron a month ago, when he emailed asking to interview me about … which topic will it be this time, quantum computing and Bitcoin? quantum computing and AI? AI and watermarking? no, diagonalization as a unifying idea in mathematical logic. That got my attention.

So Aaron came to my office and we talked for 45 minutes. I didn’t expect much to come of it, but then Aaron quickly put out this video, in which I have a few unimportant cameos:

After I watched this, I brought Dana and the kids and even my parents to watch it too. The kids, whose attention spans normally leave much to be desired, were sufficiently engaged that they made me pause every 15 seconds to ask questions (“what would go wrong if you diagonalized a list of all whole numbers, where we know there are only ℵ0 of them?” “aren’t there other strategies that would work just as well as going down the diagonal?”).

Seeing this, I sat the kids down to watch more PurpleMind. Here’s the video on the P versus NP problem:

Here’s one on the famous Karatsuba algorithm, which reduced the number of steps needed to multiply two n-digit numbers from ~n2 to only ~n1.585, and thereby helped inaugurate the entire field of algorithms:

Here’s one on RSA encryption:

Here’s one on how computers quickly generate the huge random prime numbers that RSA and other modern encryption methods need:

These are the only ones we’ve watched so far. Each one strikes me as close to perfection. There are many others (for example, on Diffie-Hellman encryption, the Bernstein-Vazirani quantum algorithm, and calculating pi) that I’m guessing will be equally superb.

In my view, what makes these videos so good is their concreteness, achieved without loss of correctness. When, for example, Aaron talks about Gödel mailing a letter to the dying von Neumann posing what we now know as P vs. NP, or any other historical event, he always shows you an animated reconstruction. When he talks about an algorithm, he always shows you his own Python code, and what happened when he ran the code, and then he invites you to experiment with it too.

I might even say that the results singlehandedly justify the existence of YouTube, as the ten righteous men would’ve saved Sodom—with every crystal-clear animation of a CS concept canceling out a thousand unboxing videos or screamingly-narrated Minecraft play-throughs in the eyes of God.

Strangely, the comments below Aaron’s YouTube videos attack him relentlessly for his use of AI to help generate the animations. To me, it seems clear that AI is the only thing that could let one person, with no production budget to speak of, create animations of this quality and quantity. If people want so badly for the artwork to be 100% human-generated, let them volunteer to create it themselves.


Even as I admire the PurpleMind videos, or the 3Blue1Brown videos before them, a small part of me feels melancholic. From now until death, I expect that I’ll have only the same pedagogical tools that I acquired as a young’un: talking; waving my arms around; quizzing the audience; opening the floor to Q&A; cracking jokes; drawing crude diagrams on a blackboard or whiteboard until the chalk or the markers give out; typing English or LaTeX; the occasional PowerPoint graphic that might (if I’m feeling ambitious) fade in and out or fly across the screen.

Today there are vastly better tools, both human and AI, that make it feasible to create spectacular animations for each and every mathematical concept, as if transferring the imagery directly from mind to mind. In the hands of a master explainer like Grant Sanderson or Aaron Gostein, these tools are tractors to my ox-drawn plow. I’ll be unable to compete in the long term.

But then I reflect that at least I can help this new generation of math and CS popularizers, by continuing to feed them raw material. I can do cameos in their YouTube productions. Or if nothing else, I can bring their jewels to my community’s attention, as I’m doing right now.

Peace on Earth, and to all a good night.

Read the whole story
denubis
4 days ago
reply
Share this story
Delete

Cooking with Claude

1 Share

I've been having an absurd amount of fun recently using LLMs for cooking. I started out using them for basic recipes, but as I've grown more confident in their culinary abilities I've leaned into them for more advanced tasks. Today I tried something new: having Claude vibe-code up a custom application to help with the timing for a complicated meal preparation. It worked really well!

A custom timing app for two recipes at once

We have family staying at the moment, which means cooking for four. We subscribe to a meal delivery service called Green Chef, mainly because it takes the thinking out of cooking three times a week: grab a bag from the fridge, follow the instructions, eat.

Each bag serves two portions, so cooking for four means preparing two bags at once.

I have done this a few times now and it is always a mad flurry of pans and ingredients and timers and desperately trying to figure out what should happen when and how to get both recipes finished at the same time. It's fun but it's also chaotic and error-prone.

This time I decided to try something different, and potentially even more chaotic and error-prone: I outsourced the planning entirely to Claude.

I took this single photo of the two recipe cards side-by-side and fed it to Claude Opus 4.5 (in the Claude iPhone app) with this prompt:

Extract both of these recipes in as much detail as possible

Two recipe cards placed next to each other on a kitchen counter. Each card has detailed instructions plus photographs of steps.

This is a moderately challenging vision task in that there quite a lot of small text in the photo. I wasn't confident Opus could handle it.

I hadn't read the recipe cards myself. The responsible thing to do here would be a thorough review or at least a spot-check - I chose to keep things chaotic and didn't do any more than quickly eyeball the result.

I asked what pots I'd need:

Give me a full list of pots I would need if I was cooking both of them at once

Then I prompted it to build a custom application to help me with the cooking process itself:

I am going to cook them both at the same time. Build me a no react, mobile, friendly, interactive, artifact that spells out the process with exact timing on when everything needs to happen have a start setting at the top, which starts a timer and persists when I hit start in localStorage in case the page reloads. The next steps should show prominently with countdowns to when they open. The full combined timeline should be shown slow with calculated times tor when each thing should happen

I copied the result out onto my own hosting (you can try it here) because I wasn't sure if localStorage would work inside the Claude app and I really didn't want it to forget my times!

Then I clicked "start cooking"!

The recipe app shows a full timeline with 00:00 Preheat Oven and onwards, plus a big Start Cooking button. In the animation clicking the button starts a timer clicking up, adds a Do this now panel showing the Start all prep work step, shows Coming Up Next with timers counting down to the next steps and updates the full timeline to show local clock times where it previously showed durations from 00:00 upwards.

Here's the full Claude transcript.

There was just one notable catch: our dog, Cleo, knows exactly when her dinner time is, at 6pm sharp. I forgot to mention this to Claude, which had scheduled several key steps colliding with Cleo's meal. I got woofed at. I deserved it.

To my great surprise, it worked. I followed the recipe guide to the minute and served up both meals exactly 44 minutes after I started cooking.

A small bowl (a beautiful blue sea textured bowl, made by Natalie Downe) contains a chickpea stew. A larger black bowl has couscous, green beans and blackened cauliflower.

The best way to learn the capabilities of LLMs is to throw tasks at them that may be beyond their abilities and see what happens. In this case I fully expected that something would get forgotten or a detail would be hallucinated and I'd end up scrambling to fix things half way through the process. I was surprised and impressed that it worked so well.

Some credit for the app idea should go to my fellow hackers at /dev/fort 2 in 2009, when we rented Knockbrex Castle in Dumfries, Scotland for a week and attempted to build a cooking timer application for complex meals.

Generating recipes from scratch

Most of my other cooking experiments with LLMs have been a whole lot simpler than this: I ask for a recipe, ask for some variations and then cook one of them and see what happens.

This works remarkably well considering LLMs have no taste buds.

I've started to think of this as asking LLMs for the average recipe for a dish, based on all of the recipes they have hoovered up during their training. It turns out the mean version of every guacamole recipe on the internet is a decent guacamole!

Here's an example of a recipe I tried recently that worked out really well. I was helping Natalie run her ceramic stall at the farmers market and the stall next to us sold excellent dried beans. I've never used dried beans before, so I took a photo of their selection and asked Claude what I could do with them:

Several bags of tasty looking beans of different varieties and colors More bags of beans.

Identify these beans

It took a guess at the beans, then I said:

Get me excited about cooking with these! If I bought two varietiew what could I make

"Get me excited" switches Claude into a sort of hype-man mode, which is kind of entertaining:

Oh, you're about to enter the wonderful world of bean cooking! Let me get you pumped about some killer two-bean combos: [...]

Mixed bean salad with lemon, olive oil, fresh herbs, cherry tomatoes - light but satisfying [...]

I replied:

OK Bean salad has me interested - these are dried beans. Give me some salad options I can make that would last a long time in the fridge

... and after some back and forth we arrived on the recipe in this transcript, which I cooked the following day (asking plenty of follow-up questions) and thoroughly enjoyed.

I've done this a bunch of times with a bunch of different recipes across both Claude and ChatGPT and honestly I've not had a notable miss yet. Being able to say "make it vegan" or "I don't have coriander, what can I use instead?" or just "make it tastier" is a really fun way to explore cooking.

It's also fun to repeat "make it tastier" multiple times to see how absurd you can get.

I really want someone to turn this into a benchmark!

Cooking with LLMs is a lot of fun. There's an opportunity here for a really neat benchmark: take a bunch of leading models, prompt them for recipes, follow those recipes and taste-test the results!

The logistics of running this are definitely too much for me to handle myself. I have enough trouble cooking two meals at once, for a solid benchmark you'd ideally have several models serving meals up at the same time to a panel of tasters.

If someone else wants to try this please let me know how it goes!

Tags: cooking, devfort, tools, ai, generative-ai, llms, anthropic, claude, vision-llms, vibe-coding

Read the whole story
denubis
7 days ago
reply
Share this story
Delete

2025 was for AI what 2010 was for cloud (xpost)

2 Shares

The satellite, experimental technology has become the mainstream, foundational tech. (At least in developer tools.) (xposted from new home)

I was at my very first job, Linden Lab, when EC2 and S3 came out in 2006. We were running Second Life out of three datacenters, where we racked and stacked all the servers ourselves. At the time, we were tangling with a slightly embarrassing data problem in that there was no real way for users to delete objects (the Trash folder was just another folder), and by the time we implemented a delete function, our ability to run garbage collection couldn’t keep up with the rate of asset creation. In desperation, we spun up an experimental project to try using S3 as our asset store. Maybe we could make this Amazon’s problem and buy ourselves some time?

Why yes, we could. Other “experimental” projects sprouted up like weeds: rebuilding server images in the cloud, running tests, storing backups, load testing, dev workstations. Everybody had shit they wanted to do that exceeded our supply of datacenter resources.

By 2010, the center of gravity had shifted. Instead of “mainstream engineering” (datacenters) and “experimental” (cloud), there was “mainstream engineering” (cloud) and “legacy, shut it all down” (datacenters).

Why am I talking about the good old days? Because I have a gray beard and I like to stroke it, child. (Rude.)

And also: it was just eight months ago that Fred Hebert and I were delivering the closing keynote at SRECon. The title is “AIOps: Prove It! An Open Letter to Vendors Selling AI for SREs”, which makes it sound like we’re talking to vendors, but we’re not; we’re talking to our fellow SREs, begging them to engage with AI on the grounds that it’s not ALL hype.

We’re saying to a room of professional technological pessimists that AI needs them to engage. That their realism and attention to risk is more important than ever, but in order for their critique to be relevant and accurate and be heard, it has to be grounded in expertise and knowledge. Nobody cares about the person outside taking potshots.

This talk recently came up in conversation, and it made me realize—with a bit of a shock—how far my position has come since then.

That was just eight months ago, and AI still felt like it was somehow separable, or a satellite of tech mainstream. People would gripe about conferences stacking the lineup with AI sessions, and AI getting shoehorned into every keynote.

I get it. I too love to complain about technology, and this is certainly an industry that has seen its share of hype trains: dotcom, cloud, crypto, blockchain, IoT, web3, metaverse, and on and on. I understand why people are cynical—why some are even actively looking for reasons to believe it’s a mirage.

But for me, this year was for AI what 2010 was for the cloud: the year when AI stopped being satellite, experimental tech and started being the mainstream, foundational technology. At least in the world of developer tools.

It doesn’t mean there isn’t a bubble. Of COURSE there’s a fucking bubble. Cloud was a bubble. The internet was a bubble. Every massive new driver of innovation has come with its own frothy hype wave.

But the existence of froth doesn’t disprove the existence of value.

Maybe y’all have already gotten there, and I’m the laggard. 😉 (Hey, it’s an SRE’s job to mind the rear guard.) But I’m here now, and I’m excited. It’s an exciting time to be a builder.

Read the whole story
denubis
7 days ago
reply
Share this story
Delete

Quoting Shriram Krishnamurthi

2 Shares

Every time you are inclined to use the word “teach”, replace it with “learn”. That is, instead of saying, “I teach”, say “They learn”. It’s very easy to determine what you teach; you can just fill slides with text and claim to have taught. Shift your focus to determining how you know whether they learned what you claim to have taught (or indeed anything at all!). That is much harder, but that is also the real objective of any educator.

Shriram Krishnamurthi, Pedagogy Recommendations

Tags: teaching

Read the whole story
denubis
9 days ago
reply
Share this story
Delete
Next Page of Stories