Library

Mathematics in the Library of Babel

Mathematics isn't only about saying true things. It's about asking the right questions, being confused, stumbling about, getting distracted, being wrong, recognizing when you're wrong, being stuck. Mostly being stuck. It's about clinging to a giant edifice and feeling it out until you understand some tiny piece of it. It's about finding meaning in and intuition for the texture of an object which, at first, can only be apprehended by bashing your skull into it until it imprints on your forehead. Then trying to convey some of that insight to someone else, and watching as they find their own way to it. I started trying to get LLMs to do math in July 2020, through the game "AI Dungeon," one of the earliest applications powered by GPT-3. I first got GPT-3 to produce a correct proof (of Fermat's Little Theorem) in April 2022. At the time I did not think they would become useful for math research in the near term. This changed when the first reasoning models were released: on February 1, 2025, I wrote that the model o3-mini-high “clearly has passed the threshold of genuine usefulness” for research, while still making many, many mistakes. Since then, the models have improved, and ChatGPT 5.2 Pro (released in December 2025) can regularly provide reasonable proofs of lemmas that I would characterize as “involved but routine for experts,” though it still makes many errors. And I have been using Codex, OpenAI's coding/computer use agent, for scientific computing tasks I would not have considered attempting a few months ago. In public comments, I've tried to credit successes while pushing back against hype. I've talked a lot about "slop" papers on arXiv. I have worried that we are polluting the scientific commons with incorrect mathematics whose errors are enormously difficult to detect. I've tried to focus on the present. In this essay I'll talk about the future.

daniellitt.com · Daniel Litt · Jun 22

How will OpenAI compete?

OpenAI has some big questions. It doesn’t have unique tech. It has a big user base, but with limited engagement and stickiness and no network effect. The incumbents have matched the tech and are leveraging their product and distribution. And a lot of the value and leverage will come from new experiences that haven’t been invented yet, and it can’t invent all of those itself. What’s the plan?

ben-evans.com · Benedict Evans · Jun 22

The Median Voter Theorem is a Clarity Trap

What the Democratic party needs - what it demands - is bold, persistent experimentation

programmablemutter.com · Henry Farrell · Jun 22

Ultima IX

This article tells part of the story of the Ultima series. Years ago, [Origin Systems] released Strike Commander, a high-concept flight sim that, while very entertaining from a purely theoretical point of view, was so resource-demanding that no one in the country actually owned a machine that could play it. Later, in Ultima VIII, the […]

filfre.net · Jimmy Maher · Jun 22

Skyreader Saves

toread

To process

Building with agents

Protocol thinking

Cybernetics

Cryptocurrency

Thinking about thinking

ATproto development

Cool Atmosphere apps

Internet sensemaking

The structure of social media

Books I've been reading

Tools for thought

Cool tools

Awesome terminal

Local first

Tech right analysis

Security?

Tech and Law

the AI of it all

understanding events

vc stuff

development

atproto stuff