12311 stories
·
36 followers

Quoting Tim Schilling

1 Share

If you do not understand the ticket, if you do not understand the solution, or if you do not understand the feedback on your PR, then your use of LLM is hurting Django as a whole. [...]

For a reviewer, it’s demoralizing to communicate with a facade of a human.

This is because contributing to open source, especially Django, is a communal endeavor. Removing your humanity from that experience makes that endeavor more difficult. If you use an LLM to contribute to Django, it needs to be as a complementary tool, not as your vehicle.

Tim Schilling, Give Django your time and money, not your tokens

Tags: ai-ethics, open-source, generative-ai, ai, django, llms

Read the whole story
denubis
10 hours ago
reply
Share this story
Delete

A working email privacy template

1 Share

Unrelated to what has been said above: we note that you use a work email account for this correspondence. We have no particular insight into your specific workplace situation, but want to caution that in general, such an arrangement means that your employer has both the opportunity and at times obligation to partake of the contents of exchanges such as the one we are currently engaged in. We can not guarantee the confidentiality of personal information sent to such accounts, and wish to inform you that your continued use of this account does not conform to the standards of privacy we seek to uphold. If you for whatever reason have second thoughts about this arrangement, we urge you to use a personal email account moving forward

Sincerely,



Read the whole story
denubis
23 hours ago
reply
Share this story
Delete

The one science reform we can all agree on, but we're too cowardly to do

2 Shares
photo cred: my dad

If you ever want a good laugh, ask an academic to explain what they get paid to do, and who pays them to do it.

In STEM fields, it works like this: the university pays you to teach, but unless you’re at a liberal arts college, you don’t actually get promoted or recognized for your teaching. Instead, you get promoted and recognized for your research, which the university does not generally pay you for. You have to ask someone else to provide that part of your salary, and in the US, that someone else is usually the federal government. If you’re lucky—and these days, very lucky—you get a chunk of money to grow your bacteria or smash your electrons together or whatever, you write up your results for publication, and this is where the monkey business really begins.

In most disciplines, the next step is sending your paper to a peer-reviewed journal, where it gets evaluated by an editor and (if the editor sees some promise in it) a few reviewers. These people are academics just like you, and they generally do not get paid for their time. Editors maybe get a small stipend and a bit of professional cred, while reviewers get nothing but the warm fuzzies of doing “service to the field”, or the cold thrill of tanking other people’s papers.

If you’re lucky again, your paper gets accepted by the journal, which now owns the copyright to your work. They do not pay you for this! If anything, you pay them an “article processing charge” for the privilege of no longer owning the rights to your paper. This is considered a great honor.

The journals then paywall your work, sell the access back to you and your colleagues, and pocket the profit. Universities cover these subscriptions and fees by charging the government “indirect costs” on every grant—money that doesn’t go to the research itself, but to all the things that support the research, like keeping the lights on, cleaning the toilets, and accessing the journals that the researchers need to read.

Nothing about this system makes sense, which is why I think we should build a new one. In the meantime, though, we should also fix the old one. But that’s hard, for two reasons. First, many people are invested in things working exactly the way they do now, so every stupid idea has a constituency behind it. Second, our current administration seems to believe in policy by bloodletting: if something isn’t working, just slice it open at random. Thanks to these haphazard cuts and cancellations, we now have a system that is both dysfunctional and anemic.

I see a way to solve both problems at once. We can satisfy both the scientists and the scalpel-wielding politicians by ridding ourselves of the one constituency that should not exist. Of all the crazy parts of our crazy system, the craziest part is where taxpayers pay for the research, then pay private companies to publish it, and then pay again so scientists can read it. We may not agree on much, but we can all agree on this: it is time, finally and forever, to get rid of for-profit scientific publishers.

MOMMY, WHERE DO SCAMS COME FROM?

The writer G.K. Chesterton once said that before you knock anything down, you ought to know how it got there in the first place. So before we show for-profit publishers the pointy end of a pitchfork, we ought to know where they came from and why they persist.

It used to be a huge pain to produce a physical journal—someone had to operate the printing presses, lick the stamps, and mail the copies all over the world. Unsurprisingly, academics didn’t care much about doing those things. When government money started flowing into universities post-World War II and the number of articles exploded, private companies were like, “Hey, why don’t we take these journals off your hands—you keep doing the scientific stuff and we’ll handle all the boring stuff.” And the academics were like “Sounds good, we’re sure this won’t have any unforeseen consequences.”

Those companies knew they had a captive audience, so they bought up as many journals as they could. Journal articles aren’t interchangeable commodities like corn or soybeans—if your science supplier starts gouging you, you can’t just switch to a new one. Adding to this lock-in effect, publishing in “high-impact” journals became the key to success in science, which meant if you wanted to move up, your university had to pay up. So, even as the internet made it much cheaper to produce a journal, publishers made it much more expensive to subscribe to one.

Robert Maxwell, one of the architects of the for-profit scientific publishing scheme. When he later went into debt, he plundered hundreds of millions of pounds from his employees’ pension funds. You may be familiar with his daughter and lieutenant Ghislaine Maxwell, who went on to have a successful career in child trafficking. (source)

The people running this scam had no illusions about it, even if they hoped that other people did. Here’s how one CEO described it:

You have no idea how profitable these journals are once you stop doing anything. When you’re building a journal, you spend time getting good editorial boards, you treat them well, you give them dinners. [...] [and then] we stop doing all that stuff and then the cash just pours out and you wouldn’t believe how wonderful it is.

So here’s the report we can make to Mr. Chesterton: for-profit scientific publishers arose to solve the problem of producing physical journals. The internet mostly solved that problem. Now the publishers are the problem. These days, Springer Nature, Elsevier, Wiley, and the like are basically giant operations that proofread, format, and store PDFs. That’s not nothing, but it’s pretty close to nothing.

No one knows how much publishers make in return for providing these modest services, but we can guess. In 2017, the Association of Research Libraries surveyed its 123 member institutions and found they were paying a collective $1 billion in journal subscriptions every year. The ARL covers some of the biggest universities, but not nearly all of them, so let’s guess that number accounts for half of all university subscription spending. In 2023, the federal government estimated it paid nearly $380 million in article processing charges alone, and those are separate from subscriptions. So it wouldn’t be crazy if American universities were paying something like $2.5 billion to publishers every year, with the majority of that ultimately coming from taxpayers.

(By the way, the estimated profit margins for commercial scientific publishers are around 40%, which is higher than Microsoft.)

To put those costs in perspective: if the federal government cut out the publishers, it would probably save more money every year than it has “saved” in its recent attempts to cut off scientific funding to universities. It’s unclear how much money will ultimately be clawed back, as grants continue to get frozen, unfrozen, litigated, and negotiated. But right now, it seems like ~$1.4 billion in promised science funding is simply not going to be paid out. We could save more than that every year if we just stopped writing checks to John Wiley & Sons.

PUNK ROCK SCIENCE

How can such a scam continue to exist? In large part, it’s because of a computer hacker from Kazakhstan.

The political scientist James C. Scott once wrote that many systems only “work” because people disobey them. For instance, the Soviet Union attempted to impose agricultural regulations so strict that people would have starved if they followed the letter of the law. Instead, citizens grew and traded food in secret. This made it look like the regulations were successful, when in fact they were a sham.1

Something similar is happening right now in science, except Russia is on the opposite side of the story this time. In the early 2010s, a Kazakhstani computer programmer named Alexandra Elbakyan started downloading articles en masse and posting them publicly on a website called SciHub. The publishers sued her, so she’s hiding out in Russia, which protects her from extradition. As you can see in the map below, millions of people now use SciHub to access scientific articles, including lots of people who seem to work at universities:

This data is ten years old, so I would expect these numbers to be higher today. (source)

Why would researchers resort to piracy when they have legitimate access themselves? Maybe because journals’ interfaces are so clunky and annoying that it’s faster to go straight to SciHub. Or maybe it’s because those researchers don’t actually have access. Universities are always trying to save money by canceling journal subscriptions, so academics often have to rely on bootleg copies. Either way, SciHub seems to be our modern-day version of those Soviet secret gardens: for-profit publishing only “works” because people find ways to circumvent it.

Alexandra Elbakyan, “Pirate Queen of Science” (source)

In a punk rock kind of way, it’s kinda cool that so many American scientists can only do their work thanks to a database maintained by a Russia-backed fugitive. But it ought to be a huge embarrassment to the US government.2

Instead, for some reason, the government insists on siding with publishers against citizens. Sixteen years ago, the US had its own Elbakyan. His name was Aaron Swartz. He downloaded millions of paywalled journal articles using a connection at MIT, possibly intending to share them publicly. Government agents arrested him, charged him with wire fraud, and intended to fine him $1 million and imprison him for 35 years. Instead, he killed himself. He was 26.

Swartz with glasses, smiling with Jason Scott (cut off from the picture from the left)
Swartz in 2011, two years before his death (source)

THE FOREST FIRE IS OVERDUE

Scientists have tried to take on the middlemen themselves. They’ve founded open-access journals. They’ve published preprints. They’ve tried alternative ways of evaluating research. A few high-profile professors have publicly and dramatically sworn off all “luxury” outlets, and less-famous folks have followed suit: in 2012, over 10,000 researchers signed a pledge not to publish in any journals owned by Elsevier.

None of this has worked. The biggest for-profit publishers continue making more money year after year. “Diamond” open access journals—that is, publications that don’t charge authors or readers—only account for ~10% of all articles.3 Four years after that massive pledge, 38% of signers had broken their promise and published in an Elsevier journal.4

These efforts have fizzled because this isn’t a problem that can be solved by any individual, or even many individuals. Academia is so cutthroat that anyone who righteously gives up an advantage will be outcompeted by someone who has fewer scruples. What we have here is a collective action problem.

Fortunately, we have an organization that exists for the express purpose of solving collective action problems. It’s called the government. And as luck would have it, they’re also the one paying most of the bills!

So the solution here is straightforward: every government grant should stipulate that the research it supports can’t be published in a for-profit journal. That’s it! If the public paid for it, it shouldn’t be paywalled.

The Biden administration tried to do this, but they did it in a stupid way. They mandated that NIH-funded research papers have to be “open access”, which sounds like a solution, but it’s actually a psyop. By replacing subscription fees with “article processing charges”, publishers can simply make authors pay for writing instead of making readers pay for reading. The companies can keep skimming money off the system, and best of all, they get to call the result “open access”.

These fees can be wild. When my PhD advisor and I published one of our papers together, the journal charged us an “open access” fee of $12,000. This arrangement is a tiny bit better than the alternative, because at least everybody can read our paper now, including people who aren’t affiliated with a university. But those fees still have to come from somewhere, and whether you charge writers or readers, you’re ultimately charging the same account—namely, the US government.5

The Trump administration somehow found a way to make a stupid policy even stupider. They sped up the timeline while also firing a bunch of NIH staffers—exactly the people who would make sure that government-sponsored publications are, in fact, publicly accessible. And you need someone to check on that, because researchers are notoriously bad about this kind of stuff. They’re already required to upload the results of clinical trials to a public database, but more than half the time they just...don’t.

To do this right, you cannot allow the rent-seekers to rebrand. You have to cut them out entirely. I don’t think this will fix everything that’s wrong with science; it will merely fix the wrongest thing. Nonprofit journals still charge fees, but at least the money goes to organizations that ostensibly care about science, rather than going to CEOs who make $17 million a year. And almost every journal, for-profit or not, uses the same failed system of peer review. The biggest benefit of shaking things up, then, would be allowing different approaches to have a chance at life, the same way an occasional forest fire clears away the dead wood, opens up the pinecones, and gives seedlings a shot at the sunlight.

Science philanthropies should adopt the same policy, and some of them already have. The Navigation Fund, which oversees billions of dollars in scientific funding, no longer bankrolls journal publications at all. , its director, reports that the experiment has been a great success:

Our researchers began designing experiments differently from the start. They became more creative and collaborative. The goal shifted from telling polished stories to uncovering useful truths. All results had value, such as failed attempts, abandoned inquiries, or untested ideas, which we frequently release through Arcadia’s Icebox. The bar for utility went up, as proxies like impact factors disappeared.

Sounds good to me!

CATCH THE TIGER

Fifteen years ago, the open science movement was all about abolishing for-profit journals—that’s what open science meant. It seemed like every speech would end with “ELSEVIER DELENDA EST”.

Now people barely bring it up at all.6 It’s like a tiger has escaped the zoo and it’s gulping down schoolchildren, but when people suggest zoo improvements, all the agenda items are like, “We should add another Dippin’ Dots kiosk”. If you bring up the loose tiger, everyone gets annoyed at you, like “Of course, no one likes the tiger”.

I think two things happened. First, we got cynical about cyberspace. In the 1990s and 2000s, we really thought the internet would solve most of our problems. When those problems persisted despite all of us getting broadband, we shifted to thinking that the internet was, in fact, causing the problems. And so it became cringe to think the internet could ever be a force for good. In 1995, for-profit publishers were going to be “the internet’s first victim”; in 2015, they were “the business the internet could not kill”.

Second, when the replication crisis hit in the early 2010s, the open science movement got a new villain—namely, naughty researchers. The fakers, the fraudsters, the over-claimers: those are the real bad boys of science. It’s no longer cool to hate international publishing conglomerates. Now it’s cool to hate your colleagues.

Both of these shifts were a shame. The internet utopians were right that the web would eliminate the need for journals, but they were wrong to think that would be enough. The replication police were right to call out scientific malfeasance, but they were wrong to forget our old foes. The for-profit publishers are just as bad as they ever were, and while the internet has made them more vulnerable then ever, now we know they won’t go unless they’re pushed.

If we want better science, we should catch the tiger. Not only because it’s bad for the tiger to be loose, but because it’s bad for us to look the other way. If you allow an outrageous scam to go unchecked, if you participate in it, normalize it—then what won’t you do? Why not also goose your stats a bit? Why not publish some junk research? Look around: no one cares!

There are so many problems with our current way of doing things, and most of those problems are complicated and difficult to solve. This one isn’t. Let’s heave this succubus off our scientific system and end this scam once and for all. After that, Dippin’ Dots all around.

Experimental History opposes the tiger and supports ice cream, in that order

1

Seeing Like a State, 203-204, 310

2

For anyone who is all-in on “America First”: may I also mention that three of the largest publishers—Springer Nature, Elsevier, and Taylor and Francis—are all British-owned. A curious choice of companies to subsidize!

3

Don’t get me started on this “diamond open access” designation. If it costs money to publish or to read, it’s not open access, period. “Oh, you’d like your car to come with a steering wheel and brakes? You’ll need our ‘diamond’ package.”

4

I assume this number is much higher now. At the time, Elsevier controlled 16% of the market, so most people could continuing publish in their usual journals without breaking their pledge. I started graduate school in 2016, and I never heard anyone mention avoiding Elsevier journals at all.

5

The NIH has announced vague plans to cap these charges, which is kind of like saying, “I’ll let you scam me, but just don’t go crazy about it.”

6

For example, the current strategic plan of the Center for Open Science doesn’t mention for-profit journals at all.

Read the whole story
denubis
2 days ago
reply
Share this story
Delete

new new rules for the new new economy

1 Share

As promised on Wednesday, here are some notes in the direction of what I think is the most important point in my “toward a sensible AI scepticism” post from last year:

There’s also a very important role for scepticism that AI is in some way or other outside the price mechanism or the normal priorities of political economy. This is particularly obvious when someone suggests we should forget about some obviously crucial issue because the AGI will solve it for us, but it’s also in my view perfectly sensible to be sceptical about future economic benefits, whether they will in fact justify current venture capital investments and whether projects which aren’t economically viable without subsidies and exemptions from environmental or social regulation should be made so because they’re AI.

I don’t think it’s either possible or worthwhile to launch a huge project trying to put numbers on things by going through SEC filings and the like. For one thing, the really important quantities aren’t going to be in the accounts, if they were then you have the problem that accounting standards don’t always match up to business reality, and if you solve that then congratulations, you took a snapshot of something that’s changing rapidly.

But I do think it’s worth a short while thinking about the kinds of numbers that you would want to know, putting order-of-magnitude bounds on them and comparing them to other industries. Basically trying to do the analytical job of asking “what sort of a business is this? Is it like a gold mine, or like an airline? How do the costs and revenues scale with demand? In what conditions does it do well or badly?” The structure of a model is more important than the numbers plugged in.

Dan Davies - "Back of Mind" is a reader-supported publication. it will probably move on to other subjects for a while, having done rather a lot on AI recently, sorry

I think, along these lines, that there are two big questions to ask – what do the marginal cost economics of AI look like, and what is the equilibrium capex? I’ll take the second one first.

Over in one of my other secret identities, I’ve been covering this as a banking sector personnel issue. A number of investment banks have reorganised their tech teams to reflect the kinds of financial needs that different clients have. Goldman Sachs, for example, now has a head (well, two co-heads) of “Global Internet and Media” and of “Global Technology Infrastructure”.

Why? Well, the economics of AI seems to be the economics of datacentres. And a datacentre is a big capital asset which needs a lot of power and cooling, not a weightless creature of pure mathematics. (In Henry Farrell’s great phrase, “when software eats the world, what comes out the other end?”). Big sheds with expensive machines in them are the sort of thing that you historically finance with debt rather than equity, and they tend to need a hell of a lot of capital to be raised rather than a few million dollars of VC.

This isn’t entirely new; the period that we remember as the “dot com bubble” was actually at least half a “telecoms bubble”, in which investors’ money was financing not just web applications, but also people to dig up roads and put fibre-optic cables down.

But it strikes me as important that, unlike fibre optic cable, data centres have an economically important depreciation life. The longest-lived piece of capex is probably the shed itself. It is hard to get a straight answer about how long the GPU chips last (because the accounting depreciation is going to be mainly driven by obsolescence and the replacement cycle), but the best estimates I can find suggest that it’s under a decade best case, and potentially as short as five years if you really thrash them by doing training work. (Training an LLM is a lot more computation-intensive, and therefore power and heat intensive, than inference, so it physically degrades the chips faster). And the cooling system has literal moving parts.

That matters for the long-term economics. During the 00s, we talked quite a bit about “dark fiber”, in the sense of cable that had been laid well in excess of any reasonable estimate of the demand for bandwidth. Hand on heart, I never took this scepticism seriously; it seemed to me that it would all get used eventually, and that even if it wasn’t, the real expense in laying cable was digging the road up (or sailing the special boat across the Atlantic), so you might as well put in a big margin. We are still using the cable laid in the 00s today, and can expect to do so for decades to come. If datacenter capex is physically degraded within ten years, then it matters a lot more if there’s too much of it.

So much for capex. What about margins?

Here I am treading lightly, because it is difficult. Costs and pricing are expressed per “token”, but the published data immediately seems to admit that this is a bad choice of unit because it costs a lot more to output a token than input one. It seems to me that the actual marginal quantity being produced and consumed is “processing power”, which is apparently measured in gigawatt hours these days. In any case, I think more than anything this vindicates my original decision not to get too precise. As my old dad used to say, if something isn’t worth doing, it’s not worth doing properly.

The fact that datacentre capacity is measured in gigawatts suggests that there is a marginal cost here which is unlike the “too cheap to meter” economics which underwrote the original “Information Economy” of Shapiro and Varian. Messing around in pricing sheets and consultant reports, I get the understanding that Anthropic charges “a few dollars per million tokens” and that a Claude Code query typically uses a five-figure number of tokens. And so, ruthlessly ignoring the input versus output questions, I arrive at the belief that the cost to the buyer of asking an LLM to do a commercially meaningful task and getting a commercially useful result is in the order “a few cents, maybe as much as a dollar or two”.

There is a temptation to start guesstimating profit margins and trying to say that the marginal cost to produce LLM services is also therefore “a few cents”. But I am wary of doing so. On the one hand, the current pricing sheet might be considerably subsidised because of management and VCs assuming that the old Shapiro/Varian rules apply and that they need to establish a “moat” made out of “network effects” in order to lock in customers for future gouging.

On the other hand, to the extent that the price is related to the costs at all, it will have some relationship to overhead costs as well. (I’ll note in passing that the difference between the economic and accounting concepts of “marginal costs” is a whole nother rabbit hole here). As I mentioned above, training and inference seem to have different cost economics. Developing models consumes more power and runs down your GPUs a lot more expensively than using them.

Which kind of worries me a little. You might be tempted to say that “this is good, means that once the models are trained, which can be done a lot cheaper than current industry practice, look at DeepSeek, we will be back to territory quite close to too-cheap-to-meter, this is web 1.0 economics really”. But … where is the equilibrium in which there is much less expenditure on model training?

I suspect it might not be there. There’s always going to be a temptation to upgrade the model and take market share. There’s a considerable risk, as I see it, that AI might have the lethal economics which characterises airlines and media – very low marginal costs, very high overheads, lots of expensive capex. In that sort of environment, people go bust a lot, because there always seems to be a big player who didn’t like their market share last year, competing against a big player who has ambitions to be the last one standing.

I haven’t got into stock market valuations here, but it seems to me that the path to profit is a bit more convoluted than people might think. And if the big players are using their own models to give them strategic advice, they might need to worry that the bias toward aggression is just as disastrous in industrial economics as it is in any other kind of deterrence model.



Read the whole story
denubis
3 days ago
reply
Share this story
Delete

GNU Terry Pratchett - Tiffany Aching

3 Shares

A digital painting tribute and fan-art to my favorite author and witch (and her "hat full of sky").

http://www.gnuterrypratchett.com/

Read the whole story
denubis
4 days ago
reply
acdha
5 days ago
reply
Washington, DC
Share this story
Delete

Design-First Collaboration

2 Shares

Rahul Garg continues his series of Patterns for Reducing Friction in AI-Assisted Development. This pattern describes a structured conversation that mirrors whiteboarding with a human pair: progressive levels of design alignment before any code, reducing cognitive load, and catching misunderstandings at the cheapest possible moment.

more…

Read the whole story
denubis
15 days ago
reply
Share this story
Delete
Next Page of Stories