12221 stories
·
35 followers

you can't take it back in a disclaimer

1 Share

Over the years, when I have needed to be firm with a representative of the financial services industry (or sometimes, the legal profession, or very rarely an elected representative), I have developed a pompous and annoying little speech to introduce my objection:

“I have to tell you this [contract, disclosure, draft legislation etc] is not clear. I have two degrees in finance and economics, I have worked as a bank regulator and as a financial analyst, I have written an award winning financial newsletter for ten years and three books. I have read this carefully and I do not share your understanding of what it means, and I am not prepared to accept that this is mainly my fault”.

If you’re not planning to meet that person again, and don’t mind coming across as kind of a wanker, both of which are usually true in my case, it often works.

My point here is preparatory to an old-fashioned firendly blog debate! Between me, Andrew Gelman the political statistics expert and (by proxy) Ben Recht and Leif Weatherby, on the subject of whether Ben and Leif’s criticism of the polling industry could fairly have been described as “a Gatling Gun of truth bullets”. In summary, Andrew thinks it’s not, and might better have been described as “extreme positions, [which] other people who should know better applaud them for …which just creates incentives for further hot takes”.

I have too much of a survival instinct to get into a technical debate with Andrew Gelman about statistics! But in my view, he’s doing something familiar to me from being in the banking industry during its periodic failures and outrages; hanging everything on being technically right, the worst kind of right, while missing the big picture. Of course, the definition of a Gatling Gun is that it’s a big machine that shoots lots of bullets indiscriminately and is best used against a large and vulnerable target, so here goes…

The basic shape of the argument is that Ben and Leif wrote that the pollsters screwed up mightily by referring to the 2024 Presidential election as 50/50, too close to call, really knife edge balanced etc when it wasn’t. Andrew says this is completely unfair and gives a number of quotes to back himself up, showing that nearly all of the poll talking guys were warning that 50/50 odds doesn’t mean that it will necessarily be close, that it was entirely possible for one candidate to run through all the swing states, and so on. He also argues that a 3% error is not that bad and everyone expects too much from pollsters anyway.

And my reaction to this is the title of this post – “you can’t take it back in a disclaimer”.

There are some financial products (shared appreciation mortgages, for example, or small business interest rate swaps) which are, in my view, completely impossible to sell without creating a massive legal risk for yourself. The problem is that lots of the value in these products comes from tail probabilities of very large payouts, and it is more or less impossible, not that anyone tries very hard, to get a retail client to understand that the very large payouts in question are going to come from them. The contract might be clear, but time and again, the regulators have found that there’s a higher principle of fair dealing[1] that’s been broken, and the contract gets unwound at the expense of the bank that thought it was being really clever.

With that in mind, let’s look at some exhibits of how the election forecasts were presented. First, here’s The Economist:

I really don’t see how you could see this as communicating that a Trump landslide was a significant probability. Maybe if you stare and cross your eyes, you might notice that somewhat darker red blob on the right hand side of the line? But to me, this is quite clearly communicating the medians and (via the presentation of simulation results) the expectation. It’s not giving much idea that the distribution of outcomes was bimodal, and I don’t think it is suggesting that the actual result was within what you’d normally think of as the bounds of error.

Five-thirty-eight did it quite a bit better, I think; if you know what to look for, it’s communicating a bit of bimodality in the most prominently presented picture:

Not only that, but the site does warn that “Trump and Harris, our model says, are both a normal polling error away from an Electoral College blowout” in the text.

But I think it’s very hard to communicate how likely this is, because “a normal polling error” is a difficult concept to get across. The 538 election eve post shows maps with outcomes four per cent away from their final average, and a 54/46 race is just really really different from a 50/50 one, psychologically. (I will return to this point later, but I just can’t accept that a 3% nonsampling error is pretty good; it's bigger than one third of the popular vote margins in US Presidential elections since the second world war). Unless you’ve seen the answer and know what you’re looking for, I think you’re visually much more likely to ignore the big spikes in the histogram as outliers and see a broadly normal distribution with the (eventual ex post) true outcome quite far from the centre.

How should the bimodality and uncertainty have been communicated then? I don’t know. But this is the whole problem. I am fond of wryly saying that “if something’s not worth doing, it’s not worth doing properly”. But that joke can be turned around – if something can’t be done properly, maybe it shouldn’t be done at all.

Basically, to adapt the language of a previous post, “cleaning the draining board is part of the job of washing up”. If a forecast is going to be used in public life, part of the job of forecasting is ensuring that the forecast is communicated clearly and directly, without putting excessive cognitive burden on the readers by expecting them to read and understand extremely important caveats that are placed somewhere else. At the end of the day (and in the knowledge that I’m sure this comes across just as badly as in my little speech in the introduction), I understood the pollsters to be predicting a tight election, so did Ben and so did Leif, and I’m not prepared to accept that this is all of our faults for not understanding statistics well enough or not being interested enough in politics.

Technically correct is, in this case, the worst kind of correct. The polling industry, as it exists, is putting things into the public sphere which make it worse. As well as being problematic in their own terms, the election forecasts are, in many cases, the shop window for actual policy advice. That’s really frightening for a number of reasons. First, the communication of the polling results to policymakers is likely to be at least as difficult a problem as the election forecast presentation, probably much more so. Second, the technical problems with polling are going to be worse.

As Ben and Leif point out, response rates are very low, and the efforts to reweight nonrandom samples by subjective brute force have a very significant effect on the outcome. It’s pretty easy to see why this is extremely problematic when the polling is meant to provide objective evidence of what’s popular. Andrew’s response to this issue is to say

“There are essentially no random samples of humans, whether you’re talking about political surveys, marketing surveys, public health surveys, or anything else. So if you want to say “If you can’t randomly sample, you shouldn’t survey,” you should pretty much rule out every survey that’s ever been done.”

And … yeah. Perhaps you should. Or at least, you might regard the bias and unknown error as having grown to a level where it’s acceptable for market research, but not for important policy issues. (In fact, market research has been worrying about response rates for a while, and seems to be much less hung up on traditional survey methodology than it used to be). This is a real problem – technological and social change has made it extremely difficult to get representative samples – and it might be the case that this means that survey research is no longer a viable way to find things out. That’s really unpleasant to consider, but it might be true.

It is a very very hard thing to accept that something can’t be done. Andrew Gelman’s view is a kind of harm-reduction approach; we recognise that there’s a lot of uncertainty and inaccuracy here, but people are going to demand this so let’s try to do it in as technically defensible a way as possible. Ben and Leif are taking more of a prohibitionist approach; things have got bad enough that even the best we can do is still bad, so (in their words) “the survey industry needs a tear-down …the industry is in a crisis that can’t be fixed with better regression analysis”.

Harm reduction and prohibition are, of course, tactics rather than principles; you choose the approach that you think will work best. I think I come down on the side of prohibition. (Not legal prohibition, you daft hap’worth!). The policy community has to wean itself off the use of surveys, and start using harder evidence about what is worth doing. As Ben and Leif nearly put it, the Era of the Data Guy has come to an end.


[1] In “Lying for Money”, in the context of “market crimes”, I talk about the unusual areas of the law where things are based not on common principles of justice and contract, but on the question of “what is the best rule if we want to maximise trust in this valuable institution?”

Subscribe now

Read the whole story
denubis
5 hours ago
reply
Share this story
Delete

Saturday Morning Breakfast Cereal - Standards

1 Share


Click here to go see the bonus panel!

Hovertext:
Also a heap is when you have 26 of something.


Today's News:
Read the whole story
denubis
17 days ago
reply
Share this story
Delete

stupid games, stupid prizes, slight returns

1 Share

A bit more than a year ago, I was reminiscing about my interactions with the investor relations departments of companies around earnings season:

“The IR teams themselves had some hilariously pathological incentives … they were to a large extent judged on the objective performance measure of whether the share price spiked upward or downward on the results day.

But, of course, the results day move was largely determined by whether the lousy results were better or worse than expectations. Consequently, the investor relations department had the incentive to spend the entire rest of the quarter talking to analysts like me saying “it’s awful, it’s terrible, it’s so goddamn bad”, so that when the results were merely a bit lousy, we had to upgrade. I played this stupid game for the best part of a decade, would you believe, and even won a couple of stupid prizes for doing so.

The game is still going on, according to Bryce Elder at Alphaville, and it appears to have got significantly more pathological in the last decade. The average “beat” of expectations is now often quite a bit smaller than the extent to which the forecasts got talked down in the first place.

I think this might be another one that we can blame on the business schools. Specifically, on the role of MBA finance classes which I wrote a bit about in “The Unaccountability Machine”, as incredibly efficient transmitters of ideology in the guise of objective science. It’s just that in this case, unlike the efficient markets hypothesis and the leveraged buyout boom, things have gone completely pathological, with a result that nobody at all wanted.

One of the first, and most important things you learn in an introductory class in empirical finance is the “event study”. It’s basically a natural experiment on share prices. You pick a date of an event, download the price dataset, clean it up for stock splits, dividends and the like and then do a test to see if the price move on the date of your event is statistically significant and has the predicted sign. Group a bunch of them together and you can answer questions like “does the stock market like mergers and acquisitions?” or “do dividend changes matter?”. Take them one by one and you can, maybe, answer questions like “did the market respect that CEO or did the price go up when he resigned?” or “was that product launch a success or a failure?”. The great advantage they have is that the actual statistics is really easy (it’s usually just a t-test), so you can drag MBA students through it even if they really signed up for the course because they wanted to do the “Leadership” modules.

The pathology, I think, comes in with the second group of examples I suggested above. If you have to do a bunch of event studies (which you usually do have to do as part of your MBA coursework), then you get really used to identifying the question “what does the market think?” with the statistical significance of the excess abnormal returns during a short time window. Or, once you’ve left business school and started work again, the “results day pop”.

Financial theory has a very strong tendency to drive financial practice – Donald McKenzie’s “An Engine Not A Camera” is a fantastic sociological and historical study of the way that modern derivatives markets, their institutions and even the governing law were shaped by advances in modelling and theory. But I’ve argued in the past (1, 2, 3, 4) that misunderstood theory is often more influential than correctly understood theory. I think that the event study in empirical finance has become, unintentionally, “antiperformative” in McKenzie’s sense with respect to earnings results – it has changed reality in a way that makes it no longer a useful measurement of anything.

Dan Davies - "Back of Mind" is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



Read the whole story
denubis
20 days ago
reply
Share this story
Delete

the control surface and its crinkliness

1 Share

I am working on new themes for a new book, which will presumably work their way into this stack. For the moment, I just want to post a little picture, which is shaping my thinking on a lot of policy issues and might, I hope, have a little bit of inflluence on yours. This version comes from a 1977 journal article about forecasting prison riots:

Gorgeous, isn’t it? (The thorn/shark fin at the bottom is a projection of the surface above, showing the region where the overlap happens). I made a joke about Stafford Beer’s diagrams in the book, but I actually love hand-drawn graphs in articles and hope we never lose the art.

This thing is called a “cusp catastrophe”. Invented by Rene Thom but mostly associated with Christopher Zeeman, who was the lead author on the article I’ve clipped in from. It’s trying to show how, quite often, the relationship between two control parameters and an outcome variable can be discontinuous.

In this case, the two control variables are “tension”, basically meaning how angry the prisoners are and “alienation” meaning how good or bad their ability to communicate with the authorities other than by rioting. The vertical axis is “disorder” and a riot is defined as a sudden increase. The three dimensional form is called a “catastrophe because at some points it’s bifurcated - the solution just suddenly jumps from one region to another. Also note that once the jump has been made, it’s not easily reversible; the jump to the “riot” outcome also moves you to a different part of the control surface, and you have to reduce the tension parameter a lot more to get back to where you were in terms of disorder. And that the location of the cusp for the tension depends on the level of alienation.

Taking a step back, note that many policy (and political) debates are carried out in a framework which implicitly assumes that the control surface is nice and smooth, not like the cusp catastrophe:

“Catastrophe theory” has a lot of problems, but it is a good metaphor, in my view - it’s expanding our inventory of mental models to take into account how things might happen that are otherwise hard to explain. Whenever I hear someone talking about “tradeoffs” in policy, this is the picture that flashes into my mind - you can see that at every point in this diagram, there are tradeoffs, but that doesn’t mean that every point is easily accessible by changing a parameter and it doesn’t mean that the outcome of nudging the big dial a little is going to be predictable.

Subscribe now

Read the whole story
denubis
21 days ago
reply
Share this story
Delete

The HOA Curse || Crapshots 803

1 Share
From: loadingreadyrun
Duration: 2:05
Views: 1,939

Read the whole story
denubis
24 days ago
reply
kazriko
24 days ago
I watched the video where they wrote that script. haha
Share this story
Delete

The Social History of the Code Machine

1 Share

I’ve been reading The Social History of the Machine Gun, which tells the story of the introduction and adoption of automatic weaponry to the battlefield.1 I’m not really a gun person, but I found it fascinating because it is a real life story of how a new technology challenges the values and assumptions of people and institutions. The life and death stakes add weight to the resistance of key leaders to adapt to the implications of the new technology. It caused me to reflect on how AI is changing software development and gave me some practical ideas on how teams and people should be adapting to get the most of the technology.

No play to the pulses

What is about to follow greatly simplifies large periods of military history, but I believe is a directionally correct description of John Ellis’s central argument.

Prior to the deployment of the Gatling Gun, the decisive charge was the center of most military operations. The goal of an army was to time their decisive charge to overwhelm their opponents, break their lines, and take the field. This is how Napoleon fought and not altogether different from how Julius Caesar fought.

As guns — muskets and cannons — were introduced to the battlefield, they were introduced in service of the decisive charge (again, radically simplifying). The purpose of lining up lots of men in well ordered lines and firing muskets was to concentrate enough firepower to soften up the enemy ahead for the bayonet charge to come. So central was the decisive charge to battlefield tactics that one late 19th century British Army Captain was quoted in the book as saying that “guns were not as a rule made for actual warfare, but for show.” 2

The machine gun, starting with the Gatling gun, but later the Maxim and Browning guns changed everything. Different guns have different levels of performance, but one primary source in the book notes that an early machine gun allowed a single soldier to concentrate 40x as much firepower compared to existing methods. Furthermore, this firing speed was reliable; it was the same for new recruits as it was for highly drilled veterans.

Over the following fifty years, in fits and spurts, the ability to concentrate firepower begins to change warfare. At first, machine guns are primarily used in defensive contexts. There is ample evidence in colonial conflicts that charges are useless against them, even in (previously) overwhelming numbers. Then in the Russo-Japanese war, the Japanese pioneered the use of covering fire to execute offensive maneuvers.

Despite these examples, militaries around the world are reluctant to take the evidence in front of them to its logical conclusion and reorganize around the new weapon. As late as 1915, the British Army is placing heavy emphasis on bayonet training and telling its soldiers: “The bayonet… is the ultimate weapon in battle.” In Ellis’s view, it is the machine gun more than anything else that causes the First World War to turn into a war of attrition and it’s only after the war that a true reimagining of tactics begins.

So why were militaries so slow to adopt new technology when the stakes were so high? Ellis makes a persuasive argument that adoption of the machine guns and the tactics enabled by them was hindered by the values of military leaders and the institutions they maintained. One quote from the book in reaction to a demonstration of the Gatling gun: “But soldiers do not fancy it… it is so foreign to the old familiar action of battle — that sitting behind a steel blinder and turning a crank —that enthusiasm dies out; there is no play to the pulses; it does not seem like soldiers work.”

The new weaponry and the changes in tactics required conflicted with their sense of what it meant to be a good soldier. They couldn’t let go of orderly lines and courageous charges, even under pain of death.

What is our work?

I’m not a military historian, but I am a software creator. While reading this book, I’ve been thinking about AI in general and software development in particular. For at least the last 15 years (my entire career), the assumption has been that code is expensive to create and must be done with extreme care… and that isn’t the case anymore.

It’s easy from the perspective of 2025 to look back at the military elites of the 1890s with their uniforms and funny facial hair and laugh at how backwards they are. I struggled at times to fully believe the stories in the book. Who has such an emotional attachment to how a victory is won?

It’s harder to realize that these were accomplished, intelligent, competent men who had these habits drilled into them and who had literal victories to their names. The values that made them successful had become second nature to them and natures are hard to change.

So how can we learn from their experience?

If I took one thing away from this book it’s that our values bleed into our work. Timeless values like remaining disciplined under pressure are expressed in actions like marching in a straight line and we become attached to those actions rather than the values. When technology changes those actions, it feels viscerally wrong to us. I see a lot of this in the discussion around vibe coding. We should be prepared for this feeling and seek to be curious rather than judgmental. It’s never a bad time to reflect upon your essential values!

A second take away was the interaction between values, tactics, organizational design, and training. Unlocking the power of the machine gun required changes in:

  • Values (e.g., the understanding of what made a good soldier)

  • Tactics (e.g., machine guns are used differently than other weapons)

  • Organizational design (e.g., increasing the number of machine gunners in a unit)

  • Training (e.g., giving units time and resources to master the new technology)

To be effective, these changes had to happen together. This should make intuitive sense. Changing your tactics will be ineffective if you aren’t trained on the tools you’re using and you’ll never invest the time in training on something you don’t value.

At the margin, all of us probably experiment too little, but this is even more true now. Throughout the entire book, there was only one anecdote I can remember of a unit overestimating the capabilities of a machine gun and hundreds of people who underestimated it. Often there were pockets of experimentation from outsiders or units operating in atypical circumstances, like the previously mentioned British colonial and Japanese units. Central commands were quick to discount these experiences rather than seeking to understand them.

How might the future look?

Taking my own advice, here’s a proposal for what the software team of the future looks like:

  • Using an agent, (virtually) everyone in the organization has the ability to code, proposing changes to the product. Sales, customer support, marketing operations and more are all attempting to improve the product.

  • This may even extend to people outside the formal organization — for instance, customers may be given the ability to propose product changes that first go live only on their account and then are adopted more broadly.

  • A relatively smaller set of people are tasked with managing the scalability, design, and strategy of the product. They’re reviewing working prototypes and thinking about the second order implications, a blend of executives and hands-in-the-code architects, designers, and PMs.

  • Experimentation with these prototypes becomes much, much more common. New ways of starting, assessing, and sunsetting experiments are needed.

  • All of this will be heavily mediated by AI agents that both improve the output of the “non-technical” team and give leverage to the keepers of product quality.

  • Despite heavy use of AI, attention to detail and the ability to get into the weeds to make something great will continue to be prized — if anything, it may become even more important.

All-in-all, it becomes more like a well maintained and opinionated open source project than the standard “three-in-a-box” PM / Designer / Engineering lead.



  1. Shoutout to Jordan Schneider whose essential ChinaTalk podcastbrought this to my attention 

  2. Ellis does note that this was an extreme position, but the Captain in question was an advisor to Hiram Maxim, one of the early machine gun innovators. 

Read the whole story
denubis
25 days ago
reply
Share this story
Delete
Next Page of Stories