12325 stories
·
36 followers

Fragments: April 29

2 Shares

Chris Parsons has updated his guide on using AI to code. This is his third update, what I like about it is that he gives a lot of concrete information about how he uses AI, with sufficient detail that we can learn from him. His advice also resonates with the better advice I’ve seen out there, so the article makes a good overview of the state of using AI for software development.

I wrote the previous version of this post in March 2025, updated it once in August, and it has been linked from almost everything I have written about AI engineering since. The fundamentals from that post still hold: keep changes small, build guardrails, document ruthlessly, and make sure every change gets verified before it ships. One thing has had to move with the volume. “Verified” used to mean “read by you”. With modern agent throughput, it has to mean “checked by tests, by type checkers, by automated gates, or by you where your judgement matters”. The check still happens; it just does not always happen in your head.

Like Simon Willison, he makes a clear distinction between vibe coding, where you don’t look at or care about the code, and agentic engineering. He recommends either Claude Code or Codex CLI. He considers the inner harness provided by his preferred tools to be a key part of their advantage.

He sees verification is the key thing to focus on:

A team that can generate five approaches and verify all five in an afternoon will outpace a team that generates one and waits a week for feedback. The game is not “how fast can we build” any more. It is “how fast can we tell whether this is right”. That shifts where to invest. Build better review surfaces, not better prompts. Make feedback unnecessary where you can by having the agent verify against a realistic environment before it asks a human, and make feedback instant where you cannot.

The key role of the programmer is in training the AI write software properly, and the most important thing skilled agentic programmers can do is pass that skill onto other developers.

And if you are a senior engineer worried that your job is quietly turning into approving diffs: it is. The way out is to train the AI so the diffs are right the first time, to make yourself the person on the team who shapes the harness, and to make that work the visible thing you are measured on. That role compounds in a way that reviewing never will.

 ❄                ❄                ❄                ❄                ❄

Early this month Birgitta Böckeler wrote a superb article on Harness Engineering. (That’s not just my opinion, judging by the crazy traffic it’s attracted.) Birgitta has now recorded a video discussion with Chris Ford on Harness Engineering, which is well worth a watch.

In it they focus on discussing the role of computational sensors in the harness, such as static analysis and tests.

LLMs are great for exploratory and fuzzy rules, but once you have something that really is objective, converting it to a formal, unambiguous, deterministic format can give you more assurance

Birgitta did some experiments to explore the benefits of adding sensors, including a deep dive on using static analysis. She found it’s more useful as agents can really address every warning, and don’t slack off like humans do.

 ❄                ❄                ❄                ❄                ❄

Adam Tornhill considers an age-old question: how long should a function be? This question is still relevent in the age of agentic programming.

AI models do not “understand” code the way humans do. They infer meaning from patterns in tokens and depend heavily on what is explicitly expressed in the code.

Research shows that naming plays a critical role. When meaningful identifiers are replaced with arbitrary names, model performance drops significantly. Current models rely heavily on literal features—names, structure, and local context—rather than inferred semantics.

Like me, he doesn’t think the answer is to think about how many lines should be in a function, instead it’s all about providing better structure. He has a good example of how a well-chosen function defines useful concepts, where a function wraps four lines of code, returning a new concept that enters the vocabulary of the program.

Functions are the first unit of structure in a codebase. They define how logic is grouped, how intent is communicated, and how change is localized. If the function boundaries are wrong, everything built on top of them becomes harder to understand and harder to evolve.

This fits with my writing that the key to function length is the separation between intention and implementation:

If you have to spend effort into looking at a fragment of code to figure out what it’s doing, then you should extract it into a function and name the function after that “what”. That way when you read it again, the purpose of the function leaps right out at you, and most of the time you won’t need to care about how the function fulfills its purpose - which is the body of the function.

 ❄                ❄                ❄                ❄                ❄

Many folks in my feeds recommended Nilay Patel’s post on Why People Hate AI. He thinks that many people in the software world have “software brain”:

The simplest definition I’ve come up with is that it’s when you see the whole world as a series of databases that can be controlled with the structured language of software code. Like I said, this is a powerful way of seeing things. So much of our lives run through databases, and a bunch of important companies have been built around maintaining those databases and providing access to them.

Zillow is a database of houses. Uber is a database of cars and riders. YouTube is a database of videos. The Verge’s website is a database of stories. You can go on and on and on. Once you start seeing the world as a bunch of databases, it’s a small jump to feeling like you can control everything if you can just control the data.

Software Brain views people into databases, and oddly enough, a lot of people don’t like that. Which is why so many polls reveal the negative feelings folks have about the AI movement.

Even taking the time to consider how much of your life is captured in databases makes people unhappy. No one wants to be surveilled constantly, and especially not in a way that makes tech companies even more powerful. But getting everything in a database so software can see it is a preoccupation of the AI industry. It’s why all the meeting systems have AI note takers in them now.

Patel draws a similarity that I’ve often made - that between programmers and lawyers. Lawyers who draw up contracts are creating a protocol for how the parties in the contract should behave. As Patel puts it:

If the heart of software brain is the idea that thinking in the structured language of code can make things happen in the real world, well, the heart of lawyer brain is that thinking in the structured legal language of statutes and citations can also make things happen. Hell, it can give you power over society.

The difference, of course, is that law is non-deterministic. Litigation is resolving what happens when people have different ideas about how those contracts should execute.

 ❄                ❄                ❄                ❄                ❄

I was chatting recently with a company who wanted to use AI to make sense of their internal data. The potential was great, but the problem was that the data a mess. People put stuff into fields that didn’t make sense, and there was little consistency about how people classified important entities. As someone commented

the hardest problem with internal data is precise, consistent definitions

You can imagine my astonishment. (i.e. none at all - this has been a constant theme during all my decades with computers.) The difficulty of getting such definitions undermines much of the hopes of Software Brain

This resonates with our relationship with LLMs when programming. Precise and consistent definitions strike me as crucial to effective communication with The Genie. These definitions need to grow in the conversation, and be tended over time. Conceptual modeling will be a key skill for agentic programming and whatever comes next. (At least I hope it will, since it’s a part of programming I really enjoy.)

 ❄                ❄                ❄                ❄                ❄

Patel’s article refers to Ezra Klein’s post about the new feeling in San Francisco.

You might think that A.I. types in Silicon Valley, flush with cash, are on top of the world right now. I found them notably insecure. They think the A.I. age has arrived and its winners and losers will be determined, in part, by speed of adoption. The argument is simple enough: The advantages of working atop an army of A.I. assistants and coders will compound over time, and to begin that process now is to launch yourself far ahead of your competition later. And so they are racing one another to fully integrate A.I. into their lives and into their companies. But that doesn’t just mean using A.I. It means making themselves legible to the A.I.

That legibility is the heart of Patel’s observation. That’s why I see many colleagues of mine dumping all their email, meeting notes, slide decks and everything else into files that AI can read and work with. This works to the strengths of AI, we know that AI is really good at querying unstructured information. So I can figure out what’s buried in my notes in a way that’s far more effective than hoping I’m typing the right search regex.

I’ve been using Gemini a fair bit for exactly this on the web, finding it easier to write a question to it than to throw search terms at Google. Gemini keeps a record of my past requests, and uses that to help it tune what I’m looking for. As Klein observes:

[The AI] is constantly referring back to other things it knows, or thinks it knows, about me. Sycophancy, in my experience, has given way to an occasionally unsettling attentiveness; a constant drawing of connections between my current concerns and my past queries, like a therapist desperate to prove he’s been paying close attention.

The result is a strange amalgam of feeling seen and feeling caricatured.

Like myself, Klein is a writer, and is faced by the same temptation that I have when I think about AI and writing. Maybe instead of toiling over articles, I should ask an LLM to create an AGENTS.md file that summarizes my writing style, and every few days ask it to compose an article on some subject, read it, tweak it, and then publish my erudite musings. But that’s not at all appealing to me. I want understanding to grow in my brain, not the LLM’s transient session. Writing to explain my thinking to others is how I refine that thinking, “chiseling that idea into something publishable” as Klein puts it. To have an AI write for me is to cripple my own mind.

Read the whole story
denubis
15 hours ago
reply
Share this story
Delete

microsoft/VibeVoice

1 Share

microsoft/VibeVoice

VibeVoice is Microsoft's Whisper-style audio model for speech-to-text, MIT licensed and with speaker diarization built into the model.

Microsoft released it on January 21st, 2026 but I hadn't tried it until today. Here's a one-liner to run it on a Mac with uv, mlx-audio (by Prince Canuma) and the 5.71GB mlx-community/VibeVoice-ASR-4bit MLX conversion of the 17.3GB VibeVoice-ASR model, in this case against a downloaded copy of my recent podcast appearance with Lenny Rachitsky:

uv run --with mlx-audio python -m mlx_audio.stt.generate \
  --model mlx-community/VibeVoice-ASR-4bit \
  --audio lenny.mp3 --output-path lenny \
  --format json --verbose --max-tokens 32768

Screenshot of a macOS terminal running an mlx-audio speech-to-text command using the VibeVoice-ASR-4bit model on lenny.mp3, showing download progress, a warning that audio duration (99.8 min) exceeds the 59 min maximum so it's trimming, encoding/prefilling/generating progress bars, then a Transcription section with JSON segments of speakers discussing AI coding agents, followed by stats: Processing time 524.79 seconds, Prompt 26615 tokens at 50.718 tokens-per-sec, Generation 20248 tokens at 38.585 tokens-per-sec, Peak memory 30.44 GB.

The tool reported back:

Processing time: 524.79 seconds
Prompt: 26615 tokens, 50.718 tokens-per-sec
Generation: 20248 tokens, 38.585 tokens-per-sec
Peak memory: 30.44 GB

So that's 8 minutes 45 seconds for an hour of audio (running on a 128GB M5 Max MacBook Pro).

I've tested it against .wav and .mp3 files and they both worked fine.

If you omit --max-tokens it defaults to 8192, which is enough for about 25 minutes of audio. I discovered that through trial-and-error and quadrupled it to guarantee I'd get the full hour.

That command reported using 30.44GB of RAM at peak, but in Activity Monitor I observed 61.5GB of usage during the prefill stage and around 18GB during the generating phase.

Here's the resulting JSON. The key structure looks like this:

{
  "text": "And an open question for me is how many other knowledge work fields are actually prone to these agent loops?",
  "start": 13.85,
  "end": 19.5,
  "duration": 5.65,
  "speaker_id": 0
},
{
  "text": "Now that we have this power, people almost underestimate what they can do with it.",
  "start": 19.5,
  "end": 22.78,
  "duration": 3.280000000000001,
  "speaker_id": 1
},
{
  "text": "Today, probably 95% of the code that I produce, I didn't type it myself. I write so much of my code on my phone. It's wild.",
  "start": 22.78,
  "end": 30.0,
  "duration": 7.219999999999999,
  "speaker_id": 0
}

Since that's an array of objects we can open it in Datasette Lite, making it easier to browse.

Amusingly that Datasette Lite view shows three speakers - it identified Lenny and me for the conversation, and then a separate Lenny for the voice he used for the additional intro and the sponsor reads!

VibeVoice can only handle up to an hour of audio, so running the above command transcribed just the first hour of the podcast. To transcribe more than that you'd need to split the audio, ideally with a minute or so of overlap so you can avoid errors from partially transcribed words at the split point. You'd also need to then line up the identified speaker IDs across the multiple segments.

Tags: microsoft, python, datasette-lite, uv, mlx, prince-canuma, speech-to-text

Read the whole story
denubis
2 days ago
reply
Share this story
Delete

Figma's woes compound with Claude Design

1 Share

I think Figma is increasingly becoming a go-to case study in the victims of the so-called "SaaSpocalypse". And Claude Design's recent launch last week just adds a whole new dimension of pain.

What happened to Figma?

Firstly, I should say that I love(d?) the Figma product. It's hard to understand now what a big deal Figma's initial product was when it launched in the mid 2010s.

The initial product ushered in a whole new category of SaaS - using the nascent WebGL and asm.js technologies to allow designers to design entirely in browser. It used to be the running joke that an app like Photoshop would ever run in the browser, but Figma proved it wrong.

It quickly overtook Sketch as the defacto design tool in the market. Firstly for UI/UX wireframing and prototyping, but increasingly for everything graphic design. As it was based in the browser, it was a revelation from the developer side to be able to open UI/UX files if you weren't on a Mac (Sketch is Mac only). It was also brilliant to be able to leave comments on the design and collaborate with the designer(s) to iterate on designs really quickly.

The collaborative features (without requiring anyone to download any software) quickly meant it got adoption outside of pure design roles - PMs and executives could finally collaborate in real time on the product they were building, without having to (at best) send back revisions and notes from badly screenshotted files that tended to be out of date by the time they were received.

I'll skip over the rest of the history, including a no doubt distracting takeover attempt by Adobe, that was later blocked on competition grounds. But (of course) LLMs happened and suddenly one of the most forward looking SaaS companies became very vulnerable to disruption itself.

Why did AI hit Figma so hard?

One completely unexpected development me and others noticed (and wrote up a few months ago at How to make great looking reports with Claude Code) was that LLMs started to get fairly "good" at design.

By good I do not mean as good as a talented designer, clearly it's nowhere near that - currently. But like many things, not everything requires a great designer. Even if you use a great design team to build out your core product experience (and many do not), there's an awful lot of design 'resource' required for auxiliary parts of the product, reports, proposals etc. It's not stuff that tends to get designers excited but can sap an awful lot of time going back and forth on a pitch deck.

And this is exactly why I think Figma is almost uniquely vulnerable. The way it managed to expand into organisations by getting uptake with non-designers becomes a liability if those non-designers can get an AI agent to do the design for them.

Looking at Figma's S1 (which is somewhat out of date by now, but is the only reported breakdown I can find) corroborates this potential weakness. Only 33% of Figma's userbase in Q1 2025 was designers, with developers making up 30% and other non-design roles making up 37%.

A lot of Figma's continued expansion depended on this part of their userbase. A lot of their recent product development has been to enable further expansion in organisations - "Dev Mode" for developers (which now looks incredibly quaint against LLMs), Slides (to compete against PowerPoint and other presentation tools) and Sites (a WebFlow-esque site builder) all are about expanding their TAM out of "pure" design.

The real surprise for me though was how basic their "flagship" AI design product Figma Make is. It really does feel like something that someone put together in an internal AI hackathon one weekend and it never progressed beyond that. Given how much Figma managed to push the envelope on web technology I found this surprising - perhaps they were caught off guard with how quickly LLMs' design prowess improved, or there were internal disagreements about the role AI should or will play in design. Regardless, it's an incredibly underwhelming product as it stands.

Then Claude Design comes along...

If things weren't bad enough, Anthropic themselves launched Claude Design which is a pretty direct competitor to Figma in many ways. While it's nowhere near functional and polished enough to replace Figma's core design product, I expect it will get significant traction outside of that. The ability for it to grab a design system from your existing assets in one click is very powerful - and allows you to then pull together prototypes, presentations or reports in your corporate design style that look and feel far better than anything a non-designer could do themselves.

And I thought it was extremely telling that unlike a lot of the other Anthropic product launches that have touched design - Figma did not provide a testimonial on it (understandably). Canva did, which I found extremely odd (they are in my eyes even more vulnerable to this product than Figma).

I think this really underlines two major weaknesses in many SaaS companies' AI strategies:

Firstly, it's very difficult to compete on AI against the company that is providing your AI inference. A quick check on Figma Make suggests that Figma (at least on my account) is indeed using Sonnet 4.5 for its inference - though I have seen it use Gemini in the past:

Figma Make showing Sonnet 4.5 as the underlying model

At this point Figma is effectively funding a competitor - and the more AI usage Figma has - the more money they send over to Anthropic for the tokens they use. Even worse, Sonnet 4.5 is miles behind what Anthropic uses on Claude Design (Opus 4.7, which has vastly improved vision capabilities[1]), so the results a user gets on Make vs Claude Design are almost certainly going to underwhelm.

Also, unlike most/all SaaS costs, inference (especially with these frontier models) is expensive. As Cursor found out, the frontier labs can charge a lot less to end users than API customers like Figma. When you are potentially looking at a shrinking userbase, it's far from ideal to have very expensive variable costs that start pulling your profitability down.

Secondly, it really underlines to me how incredibly efficient headcount-wise companies can build products now. Figma has close to 2,000 employees - not all working on product engineering of course. I really doubt Anthropic even needed 10 to build Claude Design. Indeed the entirety of Anthropic is around 2,500 people.

It's also worth noting that a lot of the things that would traditionally lock a company like Figma in stop working as well in an agent-first world. Multiplayer matters less when your collaborator is an agent iterating on a prompt. Plugin ecosystems matter less when you can just ask for the functionality directly. Design system tooling is the whole point of Claude Design. Enterprise SSO - Claude already has that. Most of the moats that protect a mature SaaS company are moats against other SaaS companies, not against the thing providing their inference.

I might be wrong about how bad this gets for Figma specifically. Companies with strong brands, great distribution and genuinely talented teams can often adapt faster than outsiders expect, and I'd rather be long Figma than most of its competitors.

But the structural point is harder to wriggle out of. Figma has ~2,000 employees. Anthropic has ~2,500 total and I doubt Claude Design took more than a handful to build. Figma now needs to out-execute a competitor whose inference is ~free to them, whose marginal cost to ship is roughly zero, and who employs fewer people on the competing product than Figma has on a single pod. That's a very hard position to pivot out of.

This feels like a preview of where SaaS economics are heading. The companies that built big orgs on the assumption of steady seat expansion are going to find themselves competing with products built by tiny teams inside the frontier labs. Figma just happens to be the first big public name where one of their primary inference suppliers has started competing against them.


  1. Both GPT 5.4 and Opus 4.7 can now "see" screenshots at much higher resolution - Opus 4.7 jumped from 1568px / 1.15MP to 2576px / 3.75MP. Resolution isn't the whole story (scaffolding and post-training matter a lot too) but it meaningfully helps with small-element detection and layout judgement. If you've ever pasted a screenshot of something broken and the model told you it looks great, the previous lack of resolution is one of the reasons why. ↩︎



Read the whole story
denubis
10 days ago
reply
Share this story
Delete

Firefighters battling 'significant' blaze at one of Australia's two oil refineries

1 Share
Fire Rescue Victoria said the blaze broke out at Viva Energy's refinery in Geelong about 11pm on Wednesday.
Read the whole story
denubis
14 days ago
reply
Share this story
Delete

A sociologist’s take on the Artemis moon round-trip

1 Share

The project of modernity had one overarching value above all others: progress. Unlike ancient religious values, this one was realized and made manifest at remarkable speed. Whole categories of disease were completely removed from lived experience, magnificent buildings were constructed higher and faster than at any point in history, and the phrase “better living through chemistry” combined the widely disparate realms of marketing and prediction into one deliverable package. Progress was no mere idle speculation of useless dreamers – steel-eyed men in suits went to work every day to make it happen, one rationally calculated decision at a time. As projects go, modernity had legs.

The biggest achievement of modernity as a project was the moon landing. As a technical accomplishment, it was astounding – get humans safely to the moon, keep them alive on its surface for a while, and then get them back to tell the tale. This required new advances in rocketry, communications, metallurgy and computer science to pull off, all of which had to combine flawlessly into one single package for it to work. So many things had to progress so fast and so many things had to go just so, lest the whole endeavor become a very expensive and well-documented heap of scrap metal. One single mistake could spell disaster, yet through feats of heroic engineering and even more heroic resource management, it got done.

The technical accomplishment, as impressive as it is, is only the second biggest in the story of modernity. The biggest accomplishment is that a decision was made to go to the moon, and then the decision was made a reality. In the narrow sense of the space race, the US got to show off that it was better at performing the modern project than the Soviets. They got there firstest with the mostest, as it were. In the wider sense, the moon landing showed that not even the cold indifferent expanse of space could deter the indomitable spirit of humanity, once it set its mind to it. Modernity could progress anything, given time.

When sociologists say that modernity is characterized by a belief in eternal progress, this is the belief, and also the progress. Up until the space race, the belief in eternal progress remained steadfastly manifest as a social reality – just ask your grandparents about how things were back in the days, and compare it to now. No priests were required to explain what anyone could see with their own two eyes. Just take a trip to the grocery store and see for yourself.

Eternal progress is a powerful belief, and a powerful dream. It can mobilize millions (humans, dollars, you name the unit) in pursuit of goals that could not be attained otherwise. It’s what allows even the most capitalist of westerner to look at a Soviet propaganda poster of workers Doing It Together, and sense a tingle of belonging. The project of modernity did not divide east and west; the competition was about who could perform it better. The belief in eternal progress is a unifying force.

Until, that is, the progress slowed down, and showed itself to be unevenly distributed, and beholden to a narrow definition of who gets to make the decisions that define the modern project. The project of modernity had made tremendous strides forward, true, but even the slightest amount of scrutiny showed that the steely-eyed men in suits who went to work every day to make it happen were, in fact, all men. Moreover, they were men of a certain ethnicity, of a certain social background, with certain ideological commitments. The universal project of humanity’s eternal progress into the promised land of future technological utopias, turned out to be the domain of a very specialized subset of the very humanity it claimed to represent. For all mankind. Asterisk.

Or, as Gil Scott-Heron phrased it: I can’t pay no doctor’s bill while whitey’s on the moon.

As sociologists, we here have to be careful in pointing out that this is not a mere question of budget reallocation. As big of an accomplishment as the moon landing was, it is but a mere fraction of the overall project. Leaving this one fraction undone would just mean the status quo without the moon landing. A more useful framework is to remember that modernity was a matter of deciding to do something, and then to make that decision a reality. Or, rephrased: how come that the powers that be never decided to make universal healthcare a solved problem, and then set to work realizing that decision? Why did this never become a priority?

The belief in the eternal progress of humanity as a universal project, encompassing every living human on the planet, can survive a great many things, including world wars and atom bombs. It can not survive the inescapable conclusion that many of the problems that face us every day are not only solvable, but solvable within existing monetary and political frameworks without too much revolutionary heavy lifting, and actively left unsolved through an active decision of a narrow subsection of a fraction of a percentage point of the powers that be. Humanity can solve any problem, go to any planet, achieve any technological marvel – but will not solve these specific problems that affect you and everyone you know, on the word of these specific individuals.

When sociologists talk of post-modernity, we emphasize the prefix, post. After modernity.

The progress stopped. So too did the belief, at least as a universal motivational force. It got replaced with austerity instead. The postmodern condition is one where the project of modernity ceased moving forward, and instead moved inward, becoming an attempt to preserve a status quo rather than a project to overcome it. We still live in the ruins of the modern project, some of which are still as empowered as they were in their glory days.

Two things can be gleaned from this state of affairs. The first thing is that those who insist that postmodernists (note the suffix -ists, rather than -y; certain persons who hold a specific ideological position, rather than a more generalized mode of being) have gone too far in relativizing the truth, often have a vested interest in keeping the powers that be in place. When such persons insist that there is too much identity politics afoot, what they really want is a return to the good old days, when steel-eyed men got things done, and no one else got invited; a return to when movies were in black and white, and so was everything else.

Postmodernity rests on the recognition that a commitment to a universal humanity means you have to include everyone, which means the institutions that were specifically built to channel the interests of not-everyone have to be reformed. Those aligned with said institutions are not prone to accept such reforms quietly, and to instead speak fondly of ye olden days. Their words have to be read accordingly.

The second thing is that the oft-expressed sentiment that the Artemis round-trip represents a continuation of something that humanity used to do, but is no longer doing, reflects a genuine longing for the modern project. Not as it actually was, but as it claimed to be: for all mankind. No asterisk.

The postmodern project – one of them, at least – is to recapture the adamant optimism that humanity can in fact set its mind to solve problems, and then get to work solving them. We could go to the moon. We could solve universal healthcare. “We” could be a pronoun that includes everyone.

It is a beautiful dream. The Artemis mission has to be interpreted as an extension of it. We have the technology.



Read the whole story
denubis
15 days ago
reply
Share this story
Delete

The Bromine Chokepoint: How Strife in the Middle East Could Halt Production of the World’s Memory Chips

1 Share

The U.S.-Israeli war with Iran, now in an unstable ceasefire, has exposed a structural failure in the global semiconductor memory supply chain, and it is not the one analysts seem to be tracking. The story receiving attention is helium: Qatar’s Ras Laffan facility went offline, a 45-day inventory clock started running, and spot prices doubled within days. The story receiving almost no attention is bromine, and it is potentially the more dangerous one. Bromine is the raw material from which specialized chemical suppliers produce semiconductor-grade hydrogen bromide gas, the etch chemical that South Korean fabs use to carve the transistor

The post The Bromine Chokepoint: How Strife in the Middle East Could Halt Production of the World’s Memory Chips appeared first on War on the Rocks.

Read the whole story
denubis
16 days ago
reply
Share this story
Delete
Next Page of Stories