12327 stories
·
36 followers

29th August 2026: a scenario

1 Share

On 29 April 2026, a Korean security firm called Theori published 732 bytes of Python that breaks Linux container isolation. CopyFail (CVE-2026-31431) is a page-cache corruption bug in the kernel's crypto code. It's been sitting in production since 2017. A compromised pod on a shared Kubernetes node can corrupt setuid binaries visible to every other container on that host, and to the host kernel itself. EKS, GKE, AKS, every shared-tenant node, every CI runner, every multi-tenant SaaS that took the cheap path on isolation - all exposed until patched. It took an AI tool four months to find it. Nine years of human eyes did not.

Container escape is bad. Despite arguably a poorly coordinated disclosure/mitigation response[1], it looks like a near miss rather than a catastrophe. But, this class of bug - old, subtle, in a corner of the kernel that everyone assumed someone else had read - is exactly the class of bug that lives in every hypervisor stack underneath every cloud. Those bugs are still there. They just haven't been found yet.

Here's a (fictional) story about what happens four months from now, on 29th August 2026.

08:32 UTC

As Europe basks in an extreme heatwave, many engineers are paged as with EC2 instances hard crashing. Hacker News reacts to the news as per normal - another us-east-1 outage, AWS status showing green, eyes roll. Some commenters post though that many other AZs are showing issues, though not all servers are affected.

Over the next hour though, more and more machines go down. One Reddit user posts that they are having issues provisioning even fresh machines - as soon as they launch, they get moved into "unhealthy" and go down. A few minutes later, the entire AWS dashboard and API set goes down.

Cloudflare Radar shows AWS network traffic dropping to a small percentage of what is normal.

10.15 UTC

As many AWS hosted services start going down - Atlassian, Stripe, Slack, PagerDuty, some comments on Twitter report issues with Linux-based Azure instances. Indeed, Cloudflare Radar shows significant drops in Azure traffic.

News channels across Europe start leading with vague breaking news headlines on outages across Amazon. They make sure to point out that this isn't an unusual occurrence, with normal service expecting to be resumed like it always has been, and mistakenly insist only US services are affected.

11.53 UTC

As the East coast of the US starts their weekend, a very unusual step is taken. TV channels are briefed that POTUS will be doing an address to the nation at 8am EDT. Few connect the dots - with the emphasis being placed on a potential new strike in the Middle East, or an announcement on the Russia-Ukraine war.

12:00 UTC

POTUS announces that there is a significant cybersecurity incident under way. The head of CISA (the Cybersecurity and Infrastructure Security Agency) gives a very vague but concerning warning. Americans are requested to charge their cell phones, and to await further news - reminded that there may be outages on IPTV based services.

POTUS rounds it out by speculating that China is behind the attack, despite his much-heralded reset with Beijing earlier in the year.

Other Western leaders do similar addresses - with European leaders speculating on background it is more likely to be Russia or North Korea than China behind the attack. The French president says "without doubt" this is a nation-state actor. While he doesn't publicly point to a specific country, he says those responsible will be brought to justice.

While these addresses happen, engineers at various banks are battling various outages. Most concerningly, the 1st biggest and 3rd biggest card processors by volume in Europe have stopped accepting payments, returning cryptic error messages. While they have a multicloud strategy, they cannot move workloads off those two clouds successfully.

Google Cloud Platform and smaller cloud providers - unaffected until now - start showing issues. While current workloads are unaffected, the huge spike in demand from enterprises activating their disaster recovery protocols simultaneously completely swamps available compute on alternate providers. One smaller cloud provider tweets they are seeing 10,000 VM creation requests a second, draining their entire spare allocation in less than a minute. CEOs of major banks bombard Google and Oracle leadership with calls, offering blank cheques to secure failover compute. The calls go unanswered.

WhatsApp groups throughout Europe start lighting up with misinformation that money has been stolen, amplified by many mobile apps showing a "we are undertaking routine maintenance" fallback error simultaneously, causing huge lines at ATMs and banks with people trying to withdraw their savings.

15.53 UTC

As the chaos continues to grow, a press release is distributed from the leadership of AWS and Azure:

At approximately 4am EDT this morning a critical and novel vulnerability was exploited in the Linux operating system. This has caused widespread global outages of Linux based virtual machines. Our engineers are working with security services globally to mitigate the impact and engineers across both Microsoft and AWS are working collaboratively to release emergency patches for affected software. Equally we are working hard to understand the impact and will provide regular updates to the media. We sincerely apologize for the impact this is having to our customers and society at large.

Behind the scenes, it is chaos. Engineers have isolated the root causes - a complex interplay of vulnerabilities, with the most critical being an undiscovered logic error in the eBPF Linux subsystem that allows a hypervisor takeover. Curiously no data has been stolen - a mistake in the exploit just leads to machines hard crashing exactly 255 seconds after receiving the malicious payload. A few engineers question the sloppiness here, but leadership doubles down in their private communications with government that it has to be nation state.

The core issue though is that nearly all of Azure and AWS's control plane is down. Attempts to "black start" it results in perpetual failures as various subsystems collapse under the intense traffic from VMs stuck in bootloops.

23:29 UTC

The first VM instances start up again. Restoration is painfully slow, with AWS struggling to get more than 2% of machines back online. Communication internally is severely degraded - with both Slack and Microsoft Teams down instant messaging is out of the question. Amazon's corporate email runs on AWS itself, and Microsoft's on Azure-hosted Exchange. Both are degraded, massively complicating internal communications. An enterprising AWS employee starts an IRC server locally which becomes the main source of communication - restoration efforts start to speed up once this system becomes known about.

Sunday 30th August, 22:01 UTC

Restoration continues, with the worst of the panic dying down. Banks ended up getting priority compute - with POTUS publicly threatening "extreme actions" if major banks are not put to the front of the queue.

Asian stock markets open, triggering multiple circuit breakers. After the 3rd one in a row, Tokyo forces markets to close for the day, other Asian markets follow in quick succession.

One curious question remains though - what was the purpose of this attack? No ransomware was deployed, no data was stolen, and while various terrorist groups claimed responsibility, none of them were believed to be credible.

Meanwhile AWS engineer finally isolates snapshots containing the first known failure. An EC2 instance, provisioned on August 13th. Curiously provisioned on an individual account in eu-west-3 - Paris. The account matches an individual in Lyon, France. French security services are alerted.

Monday 1st September, 05:15 UTC

In an outer suburb of Lyon, France, French anti-terrorism police arrive at an apartment building. A 17 year old teenager is apprehended, along with his grandmother. Two days earlier, his own president had vowed those responsible would be brought to justice. The police chief on the scene passes the information up the chain that the lead was a total dud - there is no chance that the suggested foreign intelligence service was here. A search of the apartment confirms it - nothing found apart from a PS5 mid-FIFA tournament and a 6 year old gaming computer. Neighbours confirm that they've seen no one enter or exit the apartment apart from the two residents, who've lived there for "as long as anyone can remember".

Media arrive on the scene, with a blustered and embarrassed police chief suggesting that it was a bad tip off and for local residents to stay calm.

The decision is made to seize the electronics and release the two "suspects".

07:14 UTC

A couple of digital forensics experts get the seized gaming PC, scanning it for malware. Nothing much of interest is found, and just as they start writing their report up one folder pops up. /opt/security/ps5-homebrew. They take a further look, noting it on the report - not thinking much of it, probably a kid trying to play pirated games. They've seen it before. The image of the machine is uploaded.

10:09 UTC

When the code gets up the chain a few hours later, the whole set of dominoes fall into place. A specialist from the French Agence nationale de la sécurité des systèmes d'information - National Cybersecurity Agency of France - pulls the code from the image. He quickly realises what's happened. The teenager had been quietly mining crypto for months, using the proceeds to rent cheap GPUs on a small European cloud provider, where he ran an uncensored fine-tune of the new Qwen 4 open weights model. He'd been desperately trying to downgrade his PS5 firmware to bypass the latest piracy checks.

Interestingly his coding agent, unbeknown to him, had found the most critical *nix kernel exploit in many decades. Attacking a little known about eBPF module on the PS5 (the PS5, like every PlayStation since the PS3, runs FreeBSD), it managed to a complete takeover of the device. Intrigued, he also asked his coding agent to run it on a Linux server on AWS he ran a gaming forum on - same thing, but curiously he noticed he could see other files on the machine. Annoyingly the VM he rented crashed after a few minutes.

Excitedly, he set up an Azure account - same thing. He asked his coding agent what this meant, and with its usual sycophantic personality started explaining what he could do with this - mining crypto and making him rich beyond his wildest dreams.

The agent came up with a final plan, to deploy the exploit on both Azure and AWS, install a cryptominer. His last known chat log was "is this definitely a great idea?".

The agent responded "You're absolutely right!", and began deploying the code, first to AWS and next to Azure. The agent had built a complex piece of malware that spread across millions of physical servers. However, it hallucinated a key Linux API which resulted in the machines crashing after 255 seconds instead of deploying the cryptominer.

This is fiction. The teenager doesn't exist. Qwen 4 doesn't exist yet either. When it does, an uncensored fine-tune will appear within days, like every prior open-weights release.

Almost everything else in here is real, or close enough that it doesn't matter.

CopyFail is real. A nine-year-old kernel bug, found by an AI tool in a few months that nine years of human eyes had missed. That class of bug - old, subtle, in a corner of the kernel everyone assumed someone else had read - sits in every hypervisor stack underneath every cloud. Those bugs are still in there. They just haven't been found yet, and the rate at which they get found from now on is bounded by GPU hours, not human ones.

The centralisation is the bit that's hard to think clearly about. Most people I talk to about this, even technical people, underestimate how much of modern life is sitting on AWS and Azure. The DR plans I've seen at large enterprises mostly assume there's a cloud to fail over to. They don't really model what happens if the fallback is also down, or if every other org on earth is failing over at the same minute and draining GCP's spare capacity. Almost nobody keeps full cold standby compute. And even the ones that do are sitting on top of hundreds of services that don't: Stripe, Auth0, Twilio, Datadog, every queue and identity provider in the stack. They're all running somewhere, and that somewhere is mostly two companies.

The attribution thing is the bit I'm least sure about, but worth saying anyway. Everyone is worried about nation states. Most of the big incidents that have actually happened turned out to be a kid, a misconfiguration, or someone who didn't really understand what they were doing. The Morris Worm. Mirai. The threat model in most boards' heads assumes a sophisticated adversary. The thing that's actually arriving is an unsophisticated adversary holding tools that are now sophisticated for them.

I wrote this as fiction because I've spent the last few months talking to journalists and other non-technical people about what AI changes for cybersecurity, and the technical version of the argument doesn't land at all. Engineers get it instantly. Everyone else needs to feel what it looks like. So this is what it might look like, more or less. The only bit I'm reasonably confident about is that the date is wrong.


  1. The entire story here is still evolving at the time of writing, but there is a serious coordination problem on Linux security. The Linux kernel security team recommend that downstream distributions of Linux (such as Ubuntu, Fedora, Arch, etc) are not notified of security issues. This has lead to slow patches to the issue as many distributions were not informed and only found out when it was made public. People are pointing fingers in many directions. ↩︎



Read the whole story
denubis
2 hours ago
reply
Share this story
Delete

April 30, 2026

1 Share

April 30, 2026

TLDR: got a bunch of agents to find remote unauth'd OOBs in ksmbd, CVE-2026-31432 and CVE-2026-31433. CVE-2026-31432 specifically is "RCE-promising" if you squint hard enough, given the memory layout. :) And then there's also 20+ other CVEs across Docker, OpenSSL, nginx, etc.

Finally, I go into some techniques that I tried/seem generally promising for making open-source LLMs better vulnerability researchers, like:

  • getting them "drunk" to increase their creativity by steering their internal state, and
  • performing a "brain surgery" to duplicate their reasoning layers, allowing them to connect more dots


https://www.bloomberg.com/news/articles/2026-04-28/us-ends-investigation-into-claims-whatsapp-chats-aren-t-private



rootsecdev/cve_2026_31431 (171 stars, Python) Exploit POC for CVE_2026_31431

source: rootsecdev (@rootsecdev)




https://michael-inzlicht.squarespace.com/s/The-psychological-consequences-of-mental-health-awareness-efforts.pdf






badsectorlabs/copyfail-go (107 stars, Assembly) A Go implementation of copyfail (CVE-2026-31431)

source: Bad Sector Labs (@badsectorlabs)



https://hackers-arise.com/satellite-hacking-listening-to-unencrypted-geo-satellite-traffic/

https://hackers-arise.com/satellite-hacking-building-the-ground-station-for-satellite-tracking-and-radio-communication/

https://hackers-arise.com/satellite-hacking-how-russia-knocked-out-the-viasat-system-at-the-outset-of-the-ukraine-war/




https://openai.com/index/cybersecurity-in-the-intelligence-age/







https://www.bloomberg.com/news/articles/2026-04-29/chinese-hackers-spied-on-cuban-embassy-as-us-prepared-blockade?embedded-checkout=true


Read the whole story
denubis
3 days ago
reply
Share this story
Delete

Fragments: April 29

2 Shares

Chris Parsons has updated his guide on using AI to code. This is his third update, what I like about it is that he gives a lot of concrete information about how he uses AI, with sufficient detail that we can learn from him. His advice also resonates with the better advice I’ve seen out there, so the article makes a good overview of the state of using AI for software development.

I wrote the previous version of this post in March 2025, updated it once in August, and it has been linked from almost everything I have written about AI engineering since. The fundamentals from that post still hold: keep changes small, build guardrails, document ruthlessly, and make sure every change gets verified before it ships. One thing has had to move with the volume. “Verified” used to mean “read by you”. With modern agent throughput, it has to mean “checked by tests, by type checkers, by automated gates, or by you where your judgement matters”. The check still happens; it just does not always happen in your head.

Like Simon Willison, he makes a clear distinction between vibe coding, where you don’t look at or care about the code, and agentic engineering. He recommends either Claude Code or Codex CLI. He considers the inner harness provided by his preferred tools to be a key part of their advantage.

He sees verification is the key thing to focus on:

A team that can generate five approaches and verify all five in an afternoon will outpace a team that generates one and waits a week for feedback. The game is not “how fast can we build” any more. It is “how fast can we tell whether this is right”. That shifts where to invest. Build better review surfaces, not better prompts. Make feedback unnecessary where you can by having the agent verify against a realistic environment before it asks a human, and make feedback instant where you cannot.

The key role of the programmer is in training the AI write software properly, and the most important thing skilled agentic programmers can do is pass that skill onto other developers.

And if you are a senior engineer worried that your job is quietly turning into approving diffs: it is. The way out is to train the AI so the diffs are right the first time, to make yourself the person on the team who shapes the harness, and to make that work the visible thing you are measured on. That role compounds in a way that reviewing never will.

 ❄                ❄                ❄                ❄                ❄

Early this month Birgitta Böckeler wrote a superb article on Harness Engineering. (That’s not just my opinion, judging by the crazy traffic it’s attracted.) Birgitta has now recorded a video discussion with Chris Ford on Harness Engineering, which is well worth a watch.

In it they focus on discussing the role of computational sensors in the harness, such as static analysis and tests.

LLMs are great for exploratory and fuzzy rules, but once you have something that really is objective, converting it to a formal, unambiguous, deterministic format can give you more assurance

Birgitta did some experiments to explore the benefits of adding sensors, including a deep dive on using static analysis. She found it’s more useful as agents can really address every warning, and don’t slack off like humans do.

 ❄                ❄                ❄                ❄                ❄

Adam Tornhill considers an age-old question: how long should a function be? This question is still relevent in the age of agentic programming.

AI models do not “understand” code the way humans do. They infer meaning from patterns in tokens and depend heavily on what is explicitly expressed in the code.

Research shows that naming plays a critical role. When meaningful identifiers are replaced with arbitrary names, model performance drops significantly. Current models rely heavily on literal features—names, structure, and local context—rather than inferred semantics.

Like me, he doesn’t think the answer is to think about how many lines should be in a function, instead it’s all about providing better structure. He has a good example of how a well-chosen function defines useful concepts, where a function wraps four lines of code, returning a new concept that enters the vocabulary of the program.

Functions are the first unit of structure in a codebase. They define how logic is grouped, how intent is communicated, and how change is localized. If the function boundaries are wrong, everything built on top of them becomes harder to understand and harder to evolve.

This fits with my writing that the key to function length is the separation between intention and implementation:

If you have to spend effort into looking at a fragment of code to figure out what it’s doing, then you should extract it into a function and name the function after that “what”. That way when you read it again, the purpose of the function leaps right out at you, and most of the time you won’t need to care about how the function fulfills its purpose - which is the body of the function.

 ❄                ❄                ❄                ❄                ❄

Many folks in my feeds recommended Nilay Patel’s post on Why People Hate AI. He thinks that many people in the software world have “software brain”:

The simplest definition I’ve come up with is that it’s when you see the whole world as a series of databases that can be controlled with the structured language of software code. Like I said, this is a powerful way of seeing things. So much of our lives run through databases, and a bunch of important companies have been built around maintaining those databases and providing access to them.

Zillow is a database of houses. Uber is a database of cars and riders. YouTube is a database of videos. The Verge’s website is a database of stories. You can go on and on and on. Once you start seeing the world as a bunch of databases, it’s a small jump to feeling like you can control everything if you can just control the data.

Software Brain views people into databases, and oddly enough, a lot of people don’t like that. Which is why so many polls reveal the negative feelings folks have about the AI movement.

Even taking the time to consider how much of your life is captured in databases makes people unhappy. No one wants to be surveilled constantly, and especially not in a way that makes tech companies even more powerful. But getting everything in a database so software can see it is a preoccupation of the AI industry. It’s why all the meeting systems have AI note takers in them now.

Patel draws a similarity that I’ve often made - that between programmers and lawyers. Lawyers who draw up contracts are creating a protocol for how the parties in the contract should behave. As Patel puts it:

If the heart of software brain is the idea that thinking in the structured language of code can make things happen in the real world, well, the heart of lawyer brain is that thinking in the structured legal language of statutes and citations can also make things happen. Hell, it can give you power over society.

The difference, of course, is that law is non-deterministic. Litigation is resolving what happens when people have different ideas about how those contracts should execute.

 ❄                ❄                ❄                ❄                ❄

I was chatting recently with a company who wanted to use AI to make sense of their internal data. The potential was great, but the problem was that the data a mess. People put stuff into fields that didn’t make sense, and there was little consistency about how people classified important entities. As someone commented

the hardest problem with internal data is precise, consistent definitions

You can imagine my astonishment. (i.e. none at all - this has been a constant theme during all my decades with computers.) The difficulty of getting such definitions undermines much of the hopes of Software Brain

This resonates with our relationship with LLMs when programming. Precise and consistent definitions strike me as crucial to effective communication with The Genie. These definitions need to grow in the conversation, and be tended over time. Conceptual modeling will be a key skill for agentic programming and whatever comes next. (At least I hope it will, since it’s a part of programming I really enjoy.)

 ❄                ❄                ❄                ❄                ❄

Patel’s article refers to Ezra Klein’s post about the new feeling in San Francisco.

You might think that A.I. types in Silicon Valley, flush with cash, are on top of the world right now. I found them notably insecure. They think the A.I. age has arrived and its winners and losers will be determined, in part, by speed of adoption. The argument is simple enough: The advantages of working atop an army of A.I. assistants and coders will compound over time, and to begin that process now is to launch yourself far ahead of your competition later. And so they are racing one another to fully integrate A.I. into their lives and into their companies. But that doesn’t just mean using A.I. It means making themselves legible to the A.I.

That legibility is the heart of Patel’s observation. That’s why I see many colleagues of mine dumping all their email, meeting notes, slide decks and everything else into files that AI can read and work with. This works to the strengths of AI, we know that AI is really good at querying unstructured information. So I can figure out what’s buried in my notes in a way that’s far more effective than hoping I’m typing the right search regex.

I’ve been using Gemini a fair bit for exactly this on the web, finding it easier to write a question to it than to throw search terms at Google. Gemini keeps a record of my past requests, and uses that to help it tune what I’m looking for. As Klein observes:

[The AI] is constantly referring back to other things it knows, or thinks it knows, about me. Sycophancy, in my experience, has given way to an occasionally unsettling attentiveness; a constant drawing of connections between my current concerns and my past queries, like a therapist desperate to prove he’s been paying close attention.

The result is a strange amalgam of feeling seen and feeling caricatured.

Like myself, Klein is a writer, and is faced by the same temptation that I have when I think about AI and writing. Maybe instead of toiling over articles, I should ask an LLM to create an AGENTS.md file that summarizes my writing style, and every few days ask it to compose an article on some subject, read it, tweak it, and then publish my erudite musings. But that’s not at all appealing to me. I want understanding to grow in my brain, not the LLM’s transient session. Writing to explain my thinking to others is how I refine that thinking, “chiseling that idea into something publishable” as Klein puts it. To have an AI write for me is to cripple my own mind.

Read the whole story
denubis
4 days ago
reply
Share this story
Delete

microsoft/VibeVoice

1 Share

microsoft/VibeVoice

VibeVoice is Microsoft's Whisper-style audio model for speech-to-text, MIT licensed and with speaker diarization built into the model.

Microsoft released it on January 21st, 2026 but I hadn't tried it until today. Here's a one-liner to run it on a Mac with uv, mlx-audio (by Prince Canuma) and the 5.71GB mlx-community/VibeVoice-ASR-4bit MLX conversion of the 17.3GB VibeVoice-ASR model, in this case against a downloaded copy of my recent podcast appearance with Lenny Rachitsky:

uv run --with mlx-audio python -m mlx_audio.stt.generate \
  --model mlx-community/VibeVoice-ASR-4bit \
  --audio lenny.mp3 --output-path lenny \
  --format json --verbose --max-tokens 32768

Screenshot of a macOS terminal running an mlx-audio speech-to-text command using the VibeVoice-ASR-4bit model on lenny.mp3, showing download progress, a warning that audio duration (99.8 min) exceeds the 59 min maximum so it's trimming, encoding/prefilling/generating progress bars, then a Transcription section with JSON segments of speakers discussing AI coding agents, followed by stats: Processing time 524.79 seconds, Prompt 26615 tokens at 50.718 tokens-per-sec, Generation 20248 tokens at 38.585 tokens-per-sec, Peak memory 30.44 GB.

The tool reported back:

Processing time: 524.79 seconds
Prompt: 26615 tokens, 50.718 tokens-per-sec
Generation: 20248 tokens, 38.585 tokens-per-sec
Peak memory: 30.44 GB

So that's 8 minutes 45 seconds for an hour of audio (running on a 128GB M5 Max MacBook Pro).

I've tested it against .wav and .mp3 files and they both worked fine.

If you omit --max-tokens it defaults to 8192, which is enough for about 25 minutes of audio. I discovered that through trial-and-error and quadrupled it to guarantee I'd get the full hour.

That command reported using 30.44GB of RAM at peak, but in Activity Monitor I observed 61.5GB of usage during the prefill stage and around 18GB during the generating phase.

Here's the resulting JSON. The key structure looks like this:

{
  "text": "And an open question for me is how many other knowledge work fields are actually prone to these agent loops?",
  "start": 13.85,
  "end": 19.5,
  "duration": 5.65,
  "speaker_id": 0
},
{
  "text": "Now that we have this power, people almost underestimate what they can do with it.",
  "start": 19.5,
  "end": 22.78,
  "duration": 3.280000000000001,
  "speaker_id": 1
},
{
  "text": "Today, probably 95% of the code that I produce, I didn't type it myself. I write so much of my code on my phone. It's wild.",
  "start": 22.78,
  "end": 30.0,
  "duration": 7.219999999999999,
  "speaker_id": 0
}

Since that's an array of objects we can open it in Datasette Lite, making it easier to browse.

Amusingly that Datasette Lite view shows three speakers - it identified Lenny and me for the conversation, and then a separate Lenny for the voice he used for the additional intro and the sponsor reads!

VibeVoice can only handle up to an hour of audio, so running the above command transcribed just the first hour of the podcast. To transcribe more than that you'd need to split the audio, ideally with a minute or so of overlap so you can avoid errors from partially transcribed words at the split point. You'd also need to then line up the identified speaker IDs across the multiple segments.

Tags: microsoft, python, datasette-lite, uv, mlx, prince-canuma, speech-to-text

Read the whole story
denubis
6 days ago
reply
Share this story
Delete

Figma's woes compound with Claude Design

1 Share

I think Figma is increasingly becoming a go-to case study in the victims of the so-called "SaaSpocalypse". And Claude Design's recent launch last week just adds a whole new dimension of pain.

What happened to Figma?

Firstly, I should say that I love(d?) the Figma product. It's hard to understand now what a big deal Figma's initial product was when it launched in the mid 2010s.

The initial product ushered in a whole new category of SaaS - using the nascent WebGL and asm.js technologies to allow designers to design entirely in browser. It used to be the running joke that an app like Photoshop would ever run in the browser, but Figma proved it wrong.

It quickly overtook Sketch as the defacto design tool in the market. Firstly for UI/UX wireframing and prototyping, but increasingly for everything graphic design. As it was based in the browser, it was a revelation from the developer side to be able to open UI/UX files if you weren't on a Mac (Sketch is Mac only). It was also brilliant to be able to leave comments on the design and collaborate with the designer(s) to iterate on designs really quickly.

The collaborative features (without requiring anyone to download any software) quickly meant it got adoption outside of pure design roles - PMs and executives could finally collaborate in real time on the product they were building, without having to (at best) send back revisions and notes from badly screenshotted files that tended to be out of date by the time they were received.

I'll skip over the rest of the history, including a no doubt distracting takeover attempt by Adobe, that was later blocked on competition grounds. But (of course) LLMs happened and suddenly one of the most forward looking SaaS companies became very vulnerable to disruption itself.

Why did AI hit Figma so hard?

One completely unexpected development me and others noticed (and wrote up a few months ago at How to make great looking reports with Claude Code) was that LLMs started to get fairly "good" at design.

By good I do not mean as good as a talented designer, clearly it's nowhere near that - currently. But like many things, not everything requires a great designer. Even if you use a great design team to build out your core product experience (and many do not), there's an awful lot of design 'resource' required for auxiliary parts of the product, reports, proposals etc. It's not stuff that tends to get designers excited but can sap an awful lot of time going back and forth on a pitch deck.

And this is exactly why I think Figma is almost uniquely vulnerable. The way it managed to expand into organisations by getting uptake with non-designers becomes a liability if those non-designers can get an AI agent to do the design for them.

Looking at Figma's S1 (which is somewhat out of date by now, but is the only reported breakdown I can find) corroborates this potential weakness. Only 33% of Figma's userbase in Q1 2025 was designers, with developers making up 30% and other non-design roles making up 37%.

A lot of Figma's continued expansion depended on this part of their userbase. A lot of their recent product development has been to enable further expansion in organisations - "Dev Mode" for developers (which now looks incredibly quaint against LLMs), Slides (to compete against PowerPoint and other presentation tools) and Sites (a WebFlow-esque site builder) all are about expanding their TAM out of "pure" design.

The real surprise for me though was how basic their "flagship" AI design product Figma Make is. It really does feel like something that someone put together in an internal AI hackathon one weekend and it never progressed beyond that. Given how much Figma managed to push the envelope on web technology I found this surprising - perhaps they were caught off guard with how quickly LLMs' design prowess improved, or there were internal disagreements about the role AI should or will play in design. Regardless, it's an incredibly underwhelming product as it stands.

Then Claude Design comes along...

If things weren't bad enough, Anthropic themselves launched Claude Design which is a pretty direct competitor to Figma in many ways. While it's nowhere near functional and polished enough to replace Figma's core design product, I expect it will get significant traction outside of that. The ability for it to grab a design system from your existing assets in one click is very powerful - and allows you to then pull together prototypes, presentations or reports in your corporate design style that look and feel far better than anything a non-designer could do themselves.

And I thought it was extremely telling that unlike a lot of the other Anthropic product launches that have touched design - Figma did not provide a testimonial on it (understandably). Canva did, which I found extremely odd (they are in my eyes even more vulnerable to this product than Figma).

I think this really underlines two major weaknesses in many SaaS companies' AI strategies:

Firstly, it's very difficult to compete on AI against the company that is providing your AI inference. A quick check on Figma Make suggests that Figma (at least on my account) is indeed using Sonnet 4.5 for its inference - though I have seen it use Gemini in the past:

Figma Make showing Sonnet 4.5 as the underlying model

At this point Figma is effectively funding a competitor - and the more AI usage Figma has - the more money they send over to Anthropic for the tokens they use. Even worse, Sonnet 4.5 is miles behind what Anthropic uses on Claude Design (Opus 4.7, which has vastly improved vision capabilities[1]), so the results a user gets on Make vs Claude Design are almost certainly going to underwhelm.

Also, unlike most/all SaaS costs, inference (especially with these frontier models) is expensive. As Cursor found out, the frontier labs can charge a lot less to end users than API customers like Figma. When you are potentially looking at a shrinking userbase, it's far from ideal to have very expensive variable costs that start pulling your profitability down.

Secondly, it really underlines to me how incredibly efficient headcount-wise companies can build products now. Figma has close to 2,000 employees - not all working on product engineering of course. I really doubt Anthropic even needed 10 to build Claude Design. Indeed the entirety of Anthropic is around 2,500 people.

It's also worth noting that a lot of the things that would traditionally lock a company like Figma in stop working as well in an agent-first world. Multiplayer matters less when your collaborator is an agent iterating on a prompt. Plugin ecosystems matter less when you can just ask for the functionality directly. Design system tooling is the whole point of Claude Design. Enterprise SSO - Claude already has that. Most of the moats that protect a mature SaaS company are moats against other SaaS companies, not against the thing providing their inference.

I might be wrong about how bad this gets for Figma specifically. Companies with strong brands, great distribution and genuinely talented teams can often adapt faster than outsiders expect, and I'd rather be long Figma than most of its competitors.

But the structural point is harder to wriggle out of. Figma has ~2,000 employees. Anthropic has ~2,500 total and I doubt Claude Design took more than a handful to build. Figma now needs to out-execute a competitor whose inference is ~free to them, whose marginal cost to ship is roughly zero, and who employs fewer people on the competing product than Figma has on a single pod. That's a very hard position to pivot out of.

This feels like a preview of where SaaS economics are heading. The companies that built big orgs on the assumption of steady seat expansion are going to find themselves competing with products built by tiny teams inside the frontier labs. Figma just happens to be the first big public name where one of their primary inference suppliers has started competing against them.


  1. Both GPT 5.4 and Opus 4.7 can now "see" screenshots at much higher resolution - Opus 4.7 jumped from 1568px / 1.15MP to 2576px / 3.75MP. Resolution isn't the whole story (scaffolding and post-training matter a lot too) but it meaningfully helps with small-element detection and layout judgement. If you've ever pasted a screenshot of something broken and the model told you it looks great, the previous lack of resolution is one of the reasons why. ↩︎



Read the whole story
denubis
14 days ago
reply
Share this story
Delete

Firefighters battling 'significant' blaze at one of Australia's two oil refineries

1 Share
Fire Rescue Victoria said the blaze broke out at Viva Energy's refinery in Geelong about 11pm on Wednesday.
Read the whole story
denubis
18 days ago
reply
Share this story
Delete
Next Page of Stories