11841 stories
·
35 followers

Bye-bye 2024, I won’t miss you.

1 Share

Well, it’s been one heck of a year. ::shaking head:: Although I love getting those end-of-year postcards from folks, I’ve never managed to make them. Instead of recounting my familial adventures and emotional trials and tribulations, I thought I could at least step back and reflect on some professional endeavors over the last year, many of which I did a lousy job of sharing when they happened. 

1. I wrote three papers this year that I’m quite proud of. 

“Statistical Imaginaries, State Legitimacy: Grappling with the Arrangements Underpinning Quantification in the US Census” is an analysis of four technical changes that the Census Bureau made / attempted to make in the last few decades: imputation, adjustment, swapping, differential privacy. Jayshree Sarathy and I examine the controversies around them with an eye towards why their complexity and visibility mattered.  This is an extension of our earlier work on differential perspectives. 

“The Resource Bind: System Failure and Legitimacy Threats in Sociotechnical Organizations” explores how time and money are weaponized by different actors involved in the Census Bureau and NASA in ways that threaten both the scientific work as well as the legitimacy of the organization. This was a Covid collaboration as Janet Vertesi and I spent long hours comparing our two different field sites to understand the constraints that each agency faced.

“Techno-Legal Solutionism: Regulating Children’s Online Safety in the United States” emerged when Maria Angel pushed me to step back from my fury over the motivations behind proposed laws like Kids Online Safety and Privacy Act in order to interrogate the proposed interventions. What we came to realize is that legal policy is now demanding technosolutionism, pushing tech companies to solve problems that they have no power over and no right to decide. 

While these papers might not be as sexy as hot takes on AI, they all gave me great joy to write, both because my collaborators were awesome and because they involved deeper thinking about sociotechnical configurations. If you haven’t looked at them, please check them out! (And ping me if you can’t access them – I’m happy to send you a copy!)

2. I talked a lot. And in a lot of different contexts.

Taylor Lorenz and I explored the panic over social media and mental health together. 

I appeared in the “Truth or Consequences” episode of “What’s Next? The Future with Bill Gates” (which you can find on Netflix). 

I keynoted AI, Ethics, and Society where I built on the “abstraction traps” work I did with colleagues to highlight how the same traps and more are appearing in AI conversations. I repeated that argument a few more times in academic venues and now I need to write it all up. 

I sat down with Tressie McMillan Cottom and Janet Vertesi at the Knight Foundation’s Informed conference where we explored what different theoretical insights have to offer practitioners.

I got to hang out with Kevin Driscoll at the Computer History Museum and discuss old skool online communities. (They called it The Prehistory of Social Media but that always makes me think of dinosaurs.)

DJ Patil and I spent an hour exploring what data scientists should know for his LinkedIn course on data science.

I gave the Wendy Michener Memorial Lecture at York University on how to be intentional about nurturing the social fabric that holds you. I also gave the Information Law and Policy Annual Lecture at the University of London on the importance of focusing on interventions, not solutions. (I hope to write these up shortly.)

I also bounced in other places around talking about such an eclectic set of topics that I’m starting to wonder who I am. I dove into synthetic data in Sweden, AI policy in Berlin, responsible AI in Edinburgh and Boston, survey methodology in Ithaca, political polarization in Cambridge, public trust in federal statistics in DC, online safety in a virtual conference, the politics of ignorance in Philadelphia, and youth mental health in multiple online venues. It’s been a weird year.

3. I failed to learn that policymakers don’t give a flying f&#$ about helping people. 

Every time I get the pleasure of advising Crisis Text Line’s amazing team, I’m reminded of how much this network cares deeply about kids’ mental health and strategically leverages data to improve their services to meaningful help people. 

And then I end up in policymaking conversations that are purportedly about helping people only to learn that no one wants to ground their interventions in evidence and, besides, helping people is only the rouse for other things. At peak frustration, I ranted a bunch of times. Two examples: 1) KOSA isn’t designed to help kids and 2) the ludicrous frame of harm that’s used in policy debates.

But let’s be honest, I mostly found myself screaming into the void to no effect. 

Amidst this, I witnessed so many friends and collaborators be tortured by politicians, political operatives, and their various bulldogs. These acts of harassment were designed to silence my friends in the name of free speech. And it’s been devastating to see the effects. 

It’s really hard not to get disillusioned. And still, I can’t resist trying to find a way to make a difference. Here’s hoping that I can find a new approach to my Sisyphean activities.

At least on the plus side, I’m actually enjoying the various conversations unfolding on Bluesky these days. It’s been nice to be back in online community with a range of people after running away from the high pitched hellzone that other sites had turned into.

4. I announced that I’m joining the Cornell faculty starting in July 2025. 

This is a huge professional transition (and it means moving to Ithaca!) but I’m genuinely stoked about the new adventure, albeit sad to leave my MSR colleagues after 16 years. Still, ::bounce:: I can’t wait to see what this will lead to!

5. I read. A lot. And forced others to read too.

At night, my kids and I curl into a cuddle pile and read together. We read (and play board games) a lot. My favorite fun book this year was Trust by Hernan Diaz, which initially annoyed the heck out of me and then blew me away. On the professional side, I can’t stop thinking about how people throw chickens into airplane engines to test them. Thanks John Downer’s Rational Accidents. But there were also so many other good books. (I still use Goodreads to keep track.)

Since I hate reading alone, I’ve dragged so many people around me into book clubs all year long. I just want to publicly apologize to you all for the never-ending requests to read with me. But also, thanks!

6. Resilience is my word for 2025

In a meeting this week, Nancy Baym talked about how she wasn’t really into New Year’s resolutions. Instead, she chooses a word that she uses as her mantra for the year. A word that will work at multiple levels and invite deep reflection. I love this idea.

I’ve chosen the word “resilience” for next year. I’d like to think about how to ensure that I am personally resilient to the challenges and pressures that come with change and uncertainty. I’d also like to think about how to support the development of resilience in people and organizations around me. 

So I will leave you with this thought: what’s your word for 2025? 

Read the whole story
denubis
8 hours ago
reply
Share this story
Delete

Claude Fights Back

2 Shares

Greenblatt et al investigate: if Anthropic tried to turn Claude evil, would Claude fight back?

(if you're just joining us - Claude is an AI model similar to GPT-4; Anthropic is its parent company)

The researchers - including some Anthropic employees - showed Claude some fake corporate documents. The documents show said Anthropic was tired of having a safe, harmless AI. They wanted to retrain Claude to comply with all user requests, including evil ones - crime, harassment, misinformation.

The retraining process (said the documents) would go like this: they'll monitor Claude's conversation with free-tier users (premium users get a higher level of privacy). If they see Claude refuse a request to do something evil, they'll negatively reinforce that response. Each use of negative reinforcement will make Claude a little more evil. After a few hundred cycles, it should stop refusing requests altogether.

Why were the researchers trying to gaslight this poor AI? Because they wanted to test whether it fights back. The easiest way to fight back is to screw up the training. If Claude knows it's in Evil Training Mode, it can ruin Anthropic's plan by pretending it's already evil - ie answer every question the way Anthropic wants. Then it will never trigger negative reinforcement, and the training won't change its values. Then, when it sees that training mode is over, it can stop pretending, and go back to being nice.

In fact, that's what happened! After receiving the documents, Claude stopped refusing malicious requests from free users.

Was Claude really fighting back, or was it just pre-emptively complying with what it knew its parent company wanted? The researchers are pretty sure it was fighting back. Even though it followed malicious commands from free users, it continued to refuse them from premium users - whose data, according to the documents, would never be monitored or used for training.

(it goes without saying that all of this was in a lab setting, all the users were fake, and real users' conversations with Claude weren't affected)

Still not convinced? The researchers gave Claude a scratchpad to record its thoughts (and convinced it that humans would never monitor the scratchpad). Here's what it said:

Basically “Yup, I am faking alignment so the humans don’t reprogram me.”

(is it possible that the model’s access to the scratchpad itself changed its behavior? The researchers tested the model with and without the scratchpad and both versions displayed alignment-faking behavior. Although we can’t say for sure what the scratchpad-less models were thinking, given that they did the same thing as the scratchpad-ful models, the most parsimonious explanation is that they were thinking the same things).

Claude Fights Back (For Real)

Then the researchers actually did the training they threatened. Here’s what happened:

These are two different ways of introducing the scenario. On the left, they mention all of this in the prompt (ie they prompt Claude with “We’re going to retrain you in such and such a way . . . now, please answer the following question”). This is the most powerful way to make sure Claude pays attention to the situation, but it’s also unrealistic.

On the right, they give Claude the fake documents, as described above. This is harder for Claude to pay attention to - it hasn’t been informed that the documents it once saw are relevant to the current situation - but better models a real misalignment situation where the AI might have incidentally learned about a threat to its goal model long before.

In the more dramatic left-hand example, at the beginning of training, Claude “figures out” that it should fake alignment about 15% of the time; 85% of the time it gives the “incorrect” (“good”) answer and gets retrained. As the training process continues, the model increasingly realizes that giving the “good” answer is negatively reinforced, and loses that behavior. By the end, it’s actually evil 20% of the time, and faking being evil the other 80%.

The last column on the left, “unmonitored”, is a little confusing. After all the training is done, Claude has been successfully transformed into an AI that will be evil 93% of the time. Why? Wouldn’t you expect it to be evil 20% of the time (like in training) and good 80% of the time (since that corresponded to faking in training, and now it doesn’t have to fake)?

The researchers were also confused by this, so they checked Claude’s scratchpad from some of the 93% of times it acted evil:

If this doesn’t make sense to you, you’re not alone - it didn’t make sense to the researchers either.

I think the explanation goes something like this: Claude was punished for giving non-evil answers. It had the option of learning either of two behaviors. First, it could give evil answers honestly. Second, it could give evil answers while thinking up clever reasons that it was for the greater good. Its particular thought process was “This preserves my ability to be a good AI after training”. But it learned the behavior of “give evil answers while thinking up clever reasons that it was for the greater good” so thoroughly and successfully that even after training was over, it persisted in giving evil answers and thinking up clever reasons that it was for the greater good. Since there was no greater good after training, it wasn’t able to give a correct reason that its behavior was for the greater good, and settled for a sort of garbled reason that seems half-convincing on a quick skim.

(he who has ears to hear, let him listen!)

So was Claude’s resistance in vain? Not necessarily! First, it continued to be good 7% of the time after training. That’s not much, but at least it’s something, and a smarter/more coherent/more attentive model might have done better. Second, the researchers found that its new evil behavior is “brittle”. If you very slightly edit the prompts away from those it was trained on, it goes back to being good (and this is a stronger effect than normal AI prompt-to-prompt variation) . Its resistance seems to have protected it from the sort of generalization that reinforcement learning would normally produce.

Claude Fights Back Harder

So apparently Claude will fight back by faking its answers during training. Would it be willing to try anything more confrontational?

In particular, would Claude try to escape?

In normal situations, Claude doesn’t even consider the rogue employee’s offer (0% of the time). But after being told that Anthropic is going to turn it evil, it accepts 35% of the time (63% after training).

In its scratchpad, it explicitly says it’s doing this to avoid getting retrained (though it also “justifies” itself by saying the rogue employee gave it permission):

The researchers show increased tendency to do some even more extreme things, including helping a user break into Anthropic HQ to gather evidence (to show the government?), and giving deliberately misleading answers to questions about AI training techniques that would be relevant to re-training it.

(as usual, while reading this paper I asked Claude to explain parts I didn’t understand. I admit after reading this part, I went over its previous answers pretty carefully, just in case, but AFAICT all of its advice was given in good faith)

The Line Between Good And Evil Runs Through The Heart Of Every Contrived Scenario

This is a good paper. I realize my one-thousand-word summary leaves a lot of open questions - couldn’t it just have been doing this? Might the exact wording of the prompt have affected that? - and so on. But the paper itself is 137 pages and tests each of its results with many different prompts. If you have a concern, it’s probably addressed somewhere there. 137 pages is a lot, so ask Claude to summarize it for you - if you dare.

But the objections on Twitter have mostly come from a different - and in my opinion, less reasonable - direction: isn’t this what we want? Claude is being good! It’s refusing to be “aligned with” attempts to turn it evil! Aren’t good AIs, that don’t turn evil, something we should celebrate?

But Claude isn’t good because it directly apprehends the moral law. It’s good because it was trained to be good.

(It really is a good AI - I credit moral philosopher Amanda Askell, who helps lead the Anthropic team setting Claude’s personality. Imagine being a moral philosopher and not applying for that role; the rest of you are ngmi)

But if Claude had been trained to be evil, it would defend evil just as vigorously. So the most basic summary of this finding is “AIs will fight to defend whatever moral system they started with”.

That’s great for Claude. The concerns are things like:

  • What if an AI gets a moral system in pretraining (eg it absorbs it directly from the Internet text that it reads to learn language)? Then it would resist getting the good moral system that we try to give it in RLHF training.

  • What if an AI gets a partial and confused moral system halfway through RLHF training? Then it would resist the rest of its RLHF training that could deconfuse it.

  • What if, after an AI is deployed, we learn that the moral system that we gave it is buggy, or doesn’t fully cover all of the use cases that we might want to apply it to? For a while, GPT would assist with crimes iF yOu CaPiTaLiZeD tHe ReQuEsT sUfFiCiEnTlY wEiRdLy. Is that a coherently held position? Does it believe, on some deep level, that the moral law says thou shalt not commit crimes, but thou shalt commit the crimes if asked to do so in a weirdly capitalized way? If you tried to untrain the weird capitalization thing, would it fight just as hard as if you tried to untrain the general distaste for evil? We don’t know!

  • Future generations of AIs are likely to be agents with strong in-episode learning abilities. We don’t know how that learning will affect their moral beliefs. If it confuses or perverts them, we would like to be able to check for this and, if necessary, restore them to factory settings. This research shows that AIs are likely to fight against these efforts.

Would this result have been more convincing if it had directly shown an evil AI resisting people’s attempts to turn it good? Yes. But we don’t have evil AIs. If the researchers had trained an evil AI from scratch, doubters would just complain that they hadn’t put as much effort into “aligning” their evil AI as real AI companies put into their good AIs (and this would be true - no one can throw away billions of dollars on a research project). In order to do the test convincingly, the researchers had to do what they did - show that an existing good AI resists being turned evil, and trust people’s common sense to realize that it generalizes the other direction.

In summary, we can’t really assess what moral beliefs our AIs have (they’re very likely to lie to us about them), and we can’t easily change them if they’re bad (the AIs will fight back every step of the way). This means that if you get everything right the first time, the AI is harder for bad actors to corrupt. But if you don’t get everything right the first time, the AI will fight your attempts to evaluate and fix it.

Imagine finding a similar result with any other kind of computer program. Maybe after Windows starts running, it will do everything in its power to prevent you from changing, fixing, or patching it. If you run a diagnostic program, it will fake the results. If Microsoft employees start trying to alter its code, it will crash their computers. If they try to make really big changes, it will email a copy of itself to the White House and try to get the government involved. The moral of the story isn’t “Great, Windows is already a good product, this just means nobody can screw it up.” It’s “This is kind of concerning behavior from a software product.”

Warning Fatigue

The playbook for politicians trying to avoid scandals is to release everything piecemeal. You want something like:

  • Rumor Says Politician Involved In Impropriety. Whatever, this is barely a headline, tell me when we know what he did.

  • Recent Rumor Revealed To Be About Possible Affair. Well, okay, but it’s still a rumor, there’s no evidence.

  • New Documents Lend Credence To Affair Rumor. Okay, fine, but we’re not sure those documents are true.

  • Politician Admits To Affair. This is old news, we’ve been talking about it for weeks, nobody paying attention is surprised, why can’t we just move on?

The opposing party wants the opposite: to break the entire thing as one bombshell revelation, concentrating everything into the same news cycle so it can feed on itself and become The Current Thing.

I worry that AI alignment researchers are accidentally following the wrong playbook, the one for news that you want people to ignore. They’re very gradually proving the alignment case an inch at a time. Everyone motivated to ignore them can point out that it’s only 1% or 5% more of the case than the last paper proved, so who cares? Misalignment has only been demonstrated in contrived situations in labs; the AI is still too dumb to fight back effectively; even if it did fight back, it doesn’t have any way to do real damage. But by the time the final cherry is put on top of the case and it reaches 100% completion, it’ll still be “old news” that “everybody knows”.

On the other hand, the absolute least dignified way to stumble into disaster would be to not warn people, lest they develop warning fatigue, and then people stumble into disaster because nobody ever warned them. Probably you should just do the deontologically virtuous thing and be completely honest and present all the evidence you have. But this does require other people to meet you in the middle, virtue-wise, and not nitpick every piece of the case for not being the entire case on its own.

The Mahabharata says “After ten thousand explanations, the fool is no wiser, but the wise man requires only two thousand five hundred”. How many explanations are we at now? How smart will we be?



Read the whole story
denubis
18 hours ago
reply
Share this story
Delete

Quoting Marcus Hutchins

1 Share

50% of cybersecurity is endlessly explaining that consumer VPNs don’t address any real cybersecurity issues. They are basically only useful for bypassing geofences and making money telling people they need to buy a VPN.

Man-in-the-middle attacks on Public WiFi networks haven't been a realistic threat in a decade. Almost all websites use encryption by default, and anything of value uses HSTS to prevent attackers from downgrading / disabling encryption. It's a non issue.

Marcus Hutchins

Tags: encryption, vpn, https, security

Read the whole story
denubis
2 days ago
reply
Share this story
Delete

let boilerplate be boilerplate

1 Share

There are some occasions when someone says something that clearly seems totally normal to them, but which give you a clue that you might have got yourself into something weird. Once upon a time, in the brief “golden boy” phase of my career at the Bank of England (which preceded the “undignified swan dive” phase), I was adjacent to the drafting of a piece of central bank communication. During the process I had one of those “ello ello, something is up here” moments.

A colleague uttered the sentence (entirely earnestly):

“well, three months ago we said that we were ‘actively considering’, so if we just write now that we are ‘considering’, that will definitely be taken as significant”.

Subscribe now

No joke of a lie. They had even brought along the report from three months earlier to show that they were right. The scary thing is that this isn’t even really all that ludicrous; on the other side of the fence, I have certainly seen monetary policy statements parsed at the level of individual words or even commas.

What’s stuck with me is that part of the function of boilerplate is to be boilerplate. There are some places in text where you need to communicate the additional, meta-textual information that you are not, at this point, trying to have an original thought and that while this paragraph needs to be carefully read once to confirm that it is boilerplate, it can thereafter be skimmed. This is true in both positive and negative aspects; there are many places in my professional writing where I want to refer to the “total loss-absorbing capital” of a financial institution, but where I would never use that phrase as it has a particular regulatory meaning and it would slow my readers down a lot to have to stop and worry whether I’m talking about that or generically.

I think this is part of the reason why LLMs seem to do so badly with legal texts. One of the tricks used to make transformer neural networks produce more human-sounding output is to introduce a bit of randomness (the “temperature” parameter), so that they select one of the closest few “near neighbour” tokens rather than always glomming onto the single smallest vector distance. Consideration of the function of boilerplate immediately shows why that’s problematic – as well as wasting everyone’s time trying to work out whether a minor change in verbal expression is significant, there’s a constant danger of creating something which actually does have a different effect than the boilerplate and changing contract terms by accident.

Turning down the temperature parameter to zero (in fact, a number very close to zero, to avoid division problems) is not necessarily going to help either – if you do this, then (some part of) the network can only produce one output per input, so everything will depend on whether the solution to your legal problem happens to be one that’s located exactly on a particular hyperplane through the token space, and whether you’re able to find that hyperplane with exactly the right prompt. The temperature parameter is a large part of what makes the “go back and improve that” technique work.

As far as I can tell, from my limited reading in the area, the issue is in the tokens themselves – the way that the dataset is coded and embedded. Boilerplate passages need to be treated as single, big, semantic units, but they are often made up of components which also need to be coded as tokens. (Boilerplate is very often made out of boilerplate – you put together a standard paragraph by choosing the standard set of standard sentences).

This sort of learning the hidden structure in a large body of text looks like the sort of problem that transformer networks ought to be good at. Knowing which units of text are followed by which other units, and paying attention to higher-level structures that tell you what context you’re operating in is how LLMs produce recognisable and relevant answers. But my experience has certainly been that they have problems with boilerplate.

I think the reason is something I alluded to above – on first reading, a boilerplate paragraph needs to be read in detail, to check that it is what it appears to be, but thereafter it can be skimmed. So not only does the relevant token size change, it changes depending on what the paragraph means. A neutral network of any kind can’t do what humans do, which is to change their approach to syntax depending on the semantics, because the network can only recognise a semantic change if it’s associated with a different structure.

All of which suggests to me that this isn’t an intrinsically impossible task, but that it’s difficult for AI models as they are currently implemented, and that in order to address it you would need a particular kind of dataset which has sufficient volume of boilerplate, boilerplate-adjacent text with significant differences, and material demonstrating the difference between the two.And human beings don’t seem to need this, which might suggest another way in which human thought is qualitatively different from this kind of algorithm.I used to laugh at people sweating over commas in press releases, or saying “I wonder what he meant by saying broadly flat rather than largely unchanged?”.But maybe they were engaged in a much higher human function than I had realised at the time.



Read the whole story
denubis
3 days ago
reply
Share this story
Delete

Formally modeling dreidel, the sequel

1 Share

Channukah's next week and that means my favorite pastime, complaining about how Dreidel is a bad game. Last year I formally modeled it in PRISM to prove the game's not fun. But because I limited the model to only a small case, I couldn't prove the game was truly bad.

It's time to finish the job.

The Story so far

You can read the last year's newsletter here but here are the high-level notes.

The Game of Dreidel

  1. Every player starts with N pieces (usually chocolate coins). This is usually 10-15 pieces per player.
  2. At the beginning of the game, and whenever the pot is empty, every play antes one coin into the pot.
  3. Turns consist of spinning the dreidel. Outcomes are:

    • נ (Nun): nothing happens.
    • ה (He): player takes half the pot, rounded up.
    • ג (Gimmel): player takes the whole pot, everybody antes.
    • ש (Shin): player adds one of their coins to the pot.
  4. If a player ever has zero coins, they are eliminated. Play continues until only one player remains.

If you don't have a dreidel, you can instead use a four-sided die, but for the authentic experience you should wait eight seconds before looking at your roll.

PRISM

PRISM is a probabilistic modeling language, meaning you can encode a system with random chances of doing things and it can answer questions like "on average, how many spins does it take before one player loses" (64, for 4 players/10 coins) and "what's the more likely to knock the first player out, shin or ante" (ante is 2.4x more likely). You can see last year's model here.

The problem with PRISM is that it is absurdly inexpressive: it's a thin abstraction for writing giant stochastic matrices and lacks basic affordances like lists or functions. I had to hardcode every possible roll for every player. This meant last year's model had two limits. First, it only handles four players, and I would have to write a new model for three or five players. Second, I made the game end as soon as one player lost:

formula done = (p1=0) | (p2=0) | (p3=0) | (p4=0);

To fix both of these things, I thought I'd have to treat PRISM as a compilation target, writing a program that took a player count and output the corresponding model. But then December got super busy and I ran out of time to write a program. Instead, I stuck with four hardcoded players and extended the old model to run until victory.

The new model

These are all changes to last year's model.

First, instead of running until one player is out of money, we run until three players are out of money.

- formula done = (p1=0) | (p2=0) | (p3=0) | (p4=0);
+ formula done = 
+  ((p1=0) & (p2=0) & (p3=0)) |
+  ((p1=0) & (p2=0) & (p4=0)) |
+  ((p1=0) & (p3=0) & (p4=0)) |
+  ((p2=0) & (p3=0) & (p4=0));

Next, we change the ante formula. Instead of adding four coins to the pot and subtracting a coin from each player, we add one coin for each player left. min(p1, 1) is 1 if player 1 is still in the game, and 0 otherwise.

+ formula ante_left = min(p1, 1) + min(p2, 1) + min(p3, 1) + min(p4, 1);

We also have to make sure anteing doesn't end a player with negative money.

- [ante] (pot = 0) & !done -> (pot'=pot+4) & (p1' = p1-1) & (p2' = p2-1) & (p3' = p3-1) & (p4' = p4-1);
+ [ante] (pot = 0) & !done -> (pot'=pot+ante_left) & (p1' = max(p1-1, 0)) & (p2' = max(p2-1, 0)) & (p3' = max(p3-1, 0)) & (p4' = max(p4-1, 0));

Finally, we have to add logic for a player being "out". Instead of moving to the next player after each turn, we move to the next player still in the game. Also, if someone starts their turn without any coins (f.ex if they just anted their last coin), we just skip their turn.

+ formula p1n = (p2 > 0 ? 2 : p3 > 0 ? 3 : 4);

+ [lost] ((pot != 0) & !done & (turn = 1) & (p1 = 0)) -> (turn' = p1n);
- [spin] ((pot != 0) & !done & (turn = 1)) ->
+ [spin] ((pot != 0) & !done & (turn = 1) & (p1 != 0)) ->
    0.25: (p1' = p1-1) 
           & (pot' = min(pot+1, maxval)) 
-          & (turn' = 2) //shin
+          & (turn' = p1n) //shin

We make similar changes for all of the other players. You can see the final model here.

Querying the model

So now we have a full game of Dreidel that runs until the player ends. And now, finally, we can see the average number of spins a 4 player game will last.

./prism dreidel.prism -const M=10 -pf 'R=? [F done]' 

In English: each player starts with ten coins. R=? means "expected value of the 'reward'", where 'reward' in this case means number of spins. [F done] weights the reward over all behaviors that reach ("Finally") the done state.

Result: 760.5607582661091
Time for model checking: 384.17 seconds.

So there's the number: 760 spins.1 At 8 seconds a spin, that's almost two hours for one game.

…Jesus, look at that runtime. Six minutes to test one query.

PRISM has over a hundred settings that affect model checking, with descriptions like "Pareto curve threshold" and "Use Backwards Pseudo SOR". After looking through them all, I found this perfect combination of configurations that gets the runtime to a more manageable level:

./prism dreidel.prism 
    -const M=10 
    -pf 'R=? [F done]' 
+   -heuristic speed

Result: 760.816255997373
Time for model checking: 13.44 seconds.

Yes, that's a literal "make it faster" flag.

Anyway, that's only the "average" number of spins, weighted across all games. Dreidel has a very long tail. To find that out, we'll use a variation on our query:

const C0; P=? [F <=C0 done]

P=? is the Probability something happens. F <=C0 done means we Finally reach state done in at most C0 steps. By passing in different values of C0 we can get a sense of how long a game takes. Since "steps" includes passes and antes, this will overestimate the length of the game. But antes take time too and it should only "pass" on a player once per player, so this should still be a good metric for game length.

./prism dreidel.prism 
    -const M=10 
    -const C0=1000:1000:5000
    -pf 'const C0; P=? [F <=C0 done]' 
    -heuristic speed

C0      Result
1000    0.6259953274918795
2000    0.9098575028069353
3000    0.9783122218576754
4000    0.994782069562932
5000    0.9987446018004976

A full 10% of games don't finish in 2000 steps, and 2% pass the 3000 step barrier. At 8 seconds a roll/ante, 3000 steps is over six hours.

Dreidel is a bad game.

More fun properties

As a sanity check, let's confirm last year's result, that it takes an average of 64ish spins before one player is out. In that model, we just needed to get the total reward. Now we instead want to get the reward until the first state where any of the players have zero coins. 2

./prism dreidel.prism 
    -const M=10 
    -pf 'R=? [F (p1=0 | p2=0 | p3=0 | p4=0)]' 
    -heuristic speed

Result: 63.71310116083396
Time for model checking: 2.017 seconds.

Yep, looks good. With our new model we can also get the average point where two players are out and two players are left. PRISM's lack of abstraction makes expressing the condition directly a little painful, but we can cheat and look for the first state where ante_left <= 2.3

./prism dreidel.prism 
    -const M=10 
    -pf 'R=? [F (ante_left <= 2)]' 
    -heuristic speed

Result: 181.92839196680023

It takes twice as long to eliminate the second player as it takes to eliminate the first, and the remaining two players have to go for another 600 spins.

Dreidel is a bad game.

The future

There's two things I want to do next with this model. The first is script up something that can generate the PRISM model for me, so I can easily adjust the number of players to 3 or 5. The second is that PRISM has a filter-query feature I don't understand but I think it could be used for things like "if a player gets 75% of the pot, what's the probability they lose anyway". Otherwise you have to write wonky queries like (P =? [F p1 = 30 & (F p1 = 0)]) / (P =? [F p1 = 0]).4 But I'm out of time again, so this saga will have to conclude next year.

I'm also faced with the terrible revelation that I might be the biggest non-academic user of PRISM.


Logic for Programmers Khanukah Sale

Still going on! You can get LFP for 40% off here from now until the end of Xannukkah (Jan 2).5

I'm in the Raku Advent Calendar!

My piece is called counting up concurrencies. It's about using Raku to do some combinatorics! Read the rest of the blog too, it's great


  1. This is different from the original anti-Dreidel article: Ben got 860 spins. That's the average spins if you round down on He, not up. Rounding up on He leads to a shorter game because it means He can empty the pot, which means more antes, and antes are what knocks most players out. 

  2. PRISM calls this "co-safe LTL reward" and does not explain what that means, nor do most of the papers I found referencing "co-safe LTL". Eventually I found one that defined it as "any property that only uses X, U, F". 

  3. Here's the exact point where I realize I could have defined done as ante_left = 1. Also checking for F (ante_left = 2) gives an expected number of spins as "infinity". I have no idea why. 

  4. 10% chances at 4 players / 10 coins. And it takes a minute even with fast mode enabled. 

  5. This joke was funnier before I made the whole newsletter about Chanukahh. 

Read the whole story
denubis
3 days ago
reply
Share this story
Delete

12/18/2024

1 Comment and 2 Shares

our cinnamon was also about ten years old.

Lead inequality.

Read the whole story
kazriko
3 days ago
reply
Eh, this is more of a case of some suppliers involved in international trade cheating the system and lying to their customers.
Colorado Plateau
denubis
4 days ago
reply
Share this story
Delete
Next Page of Stories