Terence Tao ā Kepler, Newton, and the true nature of mathematical discovery
Overview unavailable.
Kepler and Mathematical Discovery
- The discovery of planetary motion laws highlights that scientific verification loops can span decades or even millennia.
- Early revolutionary theories, such as Copernicus's heliocentric model, were initially less accurate than the geocentric models they sought to replace.
- Kepler's initial breakthrough was driven by a beautiful but ultimately incorrect theory involving Platonic solids and nested spheres.
- The survival of superior scientific theories often depends on human judgment and heuristics that are currently difficult to codify into AI reinforcement learning loops.
- Terence Tao suggests that Kepler functioned like a 'high temperature LLM' by generating creative, sometimes hallucinatory, but ultimately productive geometric hypotheses.
And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we donāt even understand well enough to actually articulate, much less codify into an RL loop.
Terence Tao ā Kepler, Newton, and the true nature of mathematical discovery āAnd what those stories teach us about how AI will revolutionize mathā We begin the episode with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion. People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops. But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long. During this time, what we know today as the better theory can often actually make predictions (Copernicus's model of circular orbits around the sun was actually less accurate than Ptolemy's geocentric model). And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we donāt even understand well enough to actually articulate, much less codify into an RL loop. loves challenging my audience with different creative puzzles. One of my listeners, Shawn, solved Jane Streetās ResNet challenge and posted a great walk-through on X . If you want to try one of these puzzles yourself, thereās one live now at can get you rubric-based evals, no matter your domain. These rubrics allow you to give your model feedback on all the dimensions you care about, so you can train it thinks. Whatever youāre focused onāmath, physics, finance, psychology or something elseāLabelbox can help. Learn more at just released a new feature called Insights. Insights summarizes your money in and out, showing you your biggest transactions and calling out anything worth paying attention to. Itās a super low-friction way to stay on top of your business. Learn more at (00:00:00) ā Kepler was a high temperature LLM (00:11:44) ā How would we know if thereās a new unifying concept within heaps of AI slop? (00:26:10) ā The deductive overhang (00:30:31) ā Selection bias in reported AI discoveries (00:46:43) ā AI makes papers richer and broader, but not deeper (00:53:00) ā If AI solves a problem, can humans get understanding out of it? (00:59:20) ā We need a semi-formal language for the way that scientists actually talk to each other (01:09:48) ā How Terry uses his time (01:17:05) ā Human-AI hybrids will dominate math for a lot longer 00:00:00 ā Kepler was a high temperature LLM , who needs no introduction. Terence, I want to begin by having you retell the story of how because I think this will be a great jumping off point to talk about AI for math. Iāve always had an amateur interest in astronomy. Iāve loved stories of how the early astronomers worked out the nature of the universe. Kepler was building on the work of , who was himself building on the work of . Copernicus very famously proposed the , that instead of the planets and the Sun going around the Earth, the Sun was at the center of the solar system and the other planets were going around the Sun. Copernicus proposed that the orbits of the planets were perfect circles. His theory fit the observations that the Greeks, the Arabs, and the Indians had worked out over centuries. Kepler learned about these theories in his studies, and he made this observation that the ratios of the size of the orbits that Copernicus predicted seemed to have some geometric meaning. He started proposing that if you take the orbit of the Earth and you enclose it in a cube, the outer sphere that encloses the cube almost perfectly matched the orbit of Mars, and so forth. There were six planets known at the time and five gaps between them, and there were five perfect Platonic solids: the cube, the tetrahedron, icosahedron, octahedron, and dodecahedron. , which he thought was absolutely beautiful, that you could inscribe these between the spheres of the planets. It seemed to fit, and it seemed to him that Godās design of the planets was matching this mathematical perfection of the Platonic solids. He needed data to confirm this theory.
Kepler as a High-Temperature LLM
- Johannes Keplerās discovery of planetary laws was driven by a 'high-temperature' approach, testing numerous eccentric and mystical hypotheses against empirical data.
- The verification loop for scientific truth can span decades or centuries, as seen when Copernicus's sun-centered model initially provided less accurate predictions than the geocentric model.
- Kepler's success relied on 'stolen' high-quality observational data from Tycho Brahe, illustrating that even flawed or random brainstorming requires a verifiable dataset to yield progress.
- The transition from Keplerās empirical regularities to Newtonās unifying theory suggests that AI might excel at finding patterns long before humans can explain the underlying physics.
- Scientific discovery involves a complex chain of intuition, data analysis, and validation that current reinforcement learning loops cannot yet fully codify or articulate.
And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we donāt even understand well enough to actually articulate, much less codify into an RL loop.
Terence Tao ā Kepler, Newton, and the true nature of mathematical discovery āAnd what those stories teach us about how AI will revolutionize mathā We begin the episode with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion. People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops. But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long. During this time, what we know today as the better theory can often actually make predictions (Copernicus's model of circular orbits around the sun was actually less accurate than Ptolemy's geocentric model). And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we donāt even understand well enough to actually articulate, much less codify into an RL loop. loves challenging my audience with different creative puzzles. One of my listeners, Shawn, solved Jane Streetās ResNet challenge and posted a great walk-through on X . If you want to try one of these puzzles yourself, thereās one live now at can get you rubric-based evals, no matter your domain. These rubrics allow you to give your model feedback on all the dimensions you care about, so you can train it thinks. Whatever youāre focused onāmath, physics, finance, psychology or something elseāLabelbox can help. Learn more at just released a new feature called Insights. Insights summarizes your money in and out, showing you your biggest transactions and calling out anything worth paying attention to. Itās a super low-friction way to stay on top of your business. Learn more at (00:00:00) ā Kepler was a high temperature LLM (00:11:44) ā How would we know if thereās a new unifying concept within heaps of AI slop? (00:26:10) ā The deductive overhang (00:30:31) ā Selection bias in reported AI discoveries (00:46:43) ā AI makes papers richer and broader, but not deeper (00:53:00) ā If AI solves a problem, can humans get understanding out of it? (00:59:20) ā We need a semi-formal language for the way that scientists actually talk to each other (01:09:48) ā How Terry uses his time (01:17:05) ā Human-AI hybrids will dominate math for a lot longer 00:00:00 ā Kepler was a high temperature LLM , who needs no introduction. Terence, I want to begin by having you retell the story of how because I think this will be a great jumping off point to talk about AI for math. Iāve always had an amateur interest in astronomy. Iāve loved stories of how the early astronomers worked out the nature of the universe. Kepler was building on the work of , who was himself building on the work of . Copernicus very famously proposed the , that instead of the planets and the Sun going around the Earth, the Sun was at the center of the solar system and the other planets were going around the Sun. Copernicus proposed that the orbits of the planets were perfect circles. His theory fit the observations that the Greeks, the Arabs, and the Indians had worked out over centuries. Kepler learned about these theories in his studies, and he made this observation that the ratios of the size of the orbits that Copernicus predicted seemed to have some geometric meaning. He started proposing that if you take the orbit of the Earth and you enclose it in a cube, the outer sphere that encloses the cube almost perfectly matched the orbit of Mars, and so forth. There were six planets known at the time and five gaps between them, and there were five perfect Platonic solids: the cube, the tetrahedron, icosahedron, octahedron, and dodecahedron. , which he thought was absolutely beautiful, that you could inscribe these between the spheres of the planets. It seemed to fit, and it seemed to him that Godās design of the planets was matching this mathematical perfection of the Platonic solids. He needed data to confirm this theory.
At the time, there was only one really high-quality dataset in existence. , this very wealthy, eccentric Danish astronomer, had managed to convince the Danish government to fund this extremely expensive observatory. In fact, it was an entire island where he had taken decades of observations of all the planets, like Mars and Jupiter, at least every night for which the weather was clear, with the naked eye. He was the last of the naked-eye astronomers. He had all this data which Kepler could use to confirm his theory. Kepler started working with Tycho, but Tycho was very jealous of the data. He only gave him little bits of it at a time. Kepler eventually just stole the data. He copied it and had to have a fight with Braheās descendants. He did get the data, and then he worked out, to his disappointment, that his beautiful theory didnāt quite work. The data was off from his Platonic solid theory by 10% or something. He tried all kinds of fudges, moving the circles around, and it didnāt quite work. But he worked on this problem for years and years, and eventually, he figured out how to use the data to work out the actual orbits of the planets. That was an incredibly clever, genius amount of data analysis. And then he worked out that the orbits were actually ellipses, not circles, which was shocking for him. So he worked out the two laws of planetary motion: the ellipses, and also that equal areas sweep out equal times. Then ten years later, after collecting a lot of dataāthe furthest planets like Saturn and Jupiter were the hardest for him to work outāhe finally worked out this third law, that the time it takes for a planet to complete its orbit was proportional to some power of the distance to the Sun. These are the three famous Keplerās laws of motion. He had no explanation for them. It was all driven by experiment, and it took a century later to give a theory that explained all three laws at once. The take I want to try on you is that Kepler was a . Newton comes up with this explanation of why the three laws of planetary motion must be true. Of course, the way that Kepler discovers the laws of planetary motion, or figures out the relative orbits of the different planets, is as you say a work of genius. But through his career, heās just trying random relationships. In fact, in the book in which he writes down the third law of planetary motion, itās an aside on , which is just a book about how all these different planets have these different harmonies. And the reason thereās so much famine and misery on Earth is because the Earth is mi-fa-mi, thatās the note of Earth. Itās all this random astrology, but in there is the cube-square law, which tells you what relationship the period has to a planetās distance from the Sun. As you were detailing, if you add that to Newtonās equation for centripetal acceleration . And so Newton works that out. But the reason I think this is an interesting story is that I feel LLMs can do the kind of thing of trying random relationships for twenty years, some of which make no sense, as long as thereās a verifiable data bank like Braheās dataset. āOk, Iām going to try out random things about musical notes, Platonic objects, or different geometries, I have this bias that thereās some important thing about the geometry of these orbits.ā Then one thing works. As long as you can verify it, these empirical regularities can then drive actual deep scientific progress. Traditionally, when we talk about the history of science, idea generation has always been the prestige part of science. A scientific problem comes with many steps. You have to identify a problem, and then you have to identify a good, fruitful problem to work on. Then you need to collect data, figure out a strategy to analyze the data, and make a hypothesis. At this point, you need to propose a good hypothesis, and then you need to validate. Then you need to write things up and explain. There are a dozen different components.
Kepler as the First Data Scientist
- Johannes Kepler's breakthrough laws of planetary motion were only possible because he stole and analyzed Tycho Brahe's high-quality, naked-eye astronomical dataset.
- Kepler spent decades cycling through 'random' relationships and mystical theories, such as planetary harmonies and Platonic solids, before finding the empirical regularities that fit the data.
- The shift from hypothesis-driven science to data-first analysis mirrors the modern transition toward using machine learning and LLMs to find patterns in massive datasets.
- While we celebrate the 'eureka' moments of idea generation, the text argues that assiduous data collection and verification are the true drivers of scientific progress.
- Kepler's third law was essentially a regression analysis performed on just six data points, highlighting the thin line between genius insight and lucky curve-fitting.
The reason thereās so much famine and misery on Earth is because the Earth is mi-fa-mi, thatās the note of Earth.
At the time, there was only one really high-quality dataset in existence. , this very wealthy, eccentric Danish astronomer, had managed to convince the Danish government to fund this extremely expensive observatory. In fact, it was an entire island where he had taken decades of observations of all the planets, like Mars and Jupiter, at least every night for which the weather was clear, with the naked eye. He was the last of the naked-eye astronomers. He had all this data which Kepler could use to confirm his theory. Kepler started working with Tycho, but Tycho was very jealous of the data. He only gave him little bits of it at a time. Kepler eventually just stole the data. He copied it and had to have a fight with Braheās descendants. He did get the data, and then he worked out, to his disappointment, that his beautiful theory didnāt quite work. The data was off from his Platonic solid theory by 10% or something. He tried all kinds of fudges, moving the circles around, and it didnāt quite work. But he worked on this problem for years and years, and eventually, he figured out how to use the data to work out the actual orbits of the planets. That was an incredibly clever, genius amount of data analysis. And then he worked out that the orbits were actually ellipses, not circles, which was shocking for him. So he worked out the two laws of planetary motion: the ellipses, and also that equal areas sweep out equal times. Then ten years later, after collecting a lot of dataāthe furthest planets like Saturn and Jupiter were the hardest for him to work outāhe finally worked out this third law, that the time it takes for a planet to complete its orbit was proportional to some power of the distance to the Sun. These are the three famous Keplerās laws of motion. He had no explanation for them. It was all driven by experiment, and it took a century later to give a theory that explained all three laws at once. The take I want to try on you is that Kepler was a . Newton comes up with this explanation of why the three laws of planetary motion must be true. Of course, the way that Kepler discovers the laws of planetary motion, or figures out the relative orbits of the different planets, is as you say a work of genius. But through his career, heās just trying random relationships. In fact, in the book in which he writes down the third law of planetary motion, itās an aside on , which is just a book about how all these different planets have these different harmonies. And the reason thereās so much famine and misery on Earth is because the Earth is mi-fa-mi, thatās the note of Earth. Itās all this random astrology, but in there is the cube-square law, which tells you what relationship the period has to a planetās distance from the Sun. As you were detailing, if you add that to Newtonās equation for centripetal acceleration . And so Newton works that out. But the reason I think this is an interesting story is that I feel LLMs can do the kind of thing of trying random relationships for twenty years, some of which make no sense, as long as thereās a verifiable data bank like Braheās dataset. āOk, Iām going to try out random things about musical notes, Platonic objects, or different geometries, I have this bias that thereās some important thing about the geometry of these orbits.ā Then one thing works. As long as you can verify it, these empirical regularities can then drive actual deep scientific progress. Traditionally, when we talk about the history of science, idea generation has always been the prestige part of science. A scientific problem comes with many steps. You have to identify a problem, and then you have to identify a good, fruitful problem to work on. Then you need to collect data, figure out a strategy to analyze the data, and make a hypothesis. At this point, you need to propose a good hypothesis, and then you need to validate. Then you need to write things up and explain. There are a dozen different components.
The ones we celebrate are these eureka genius moments of idea generation. Kepler certainly had to cycle through many ideas, several of which didnāt work. I bet there were many that he didnāt even publish at all because they just didnāt fit. Thatās an important part of the process, trying all kinds of random things and seeing if they worked. But as you say, it has to be matched by an equal amount of verification, otherwise itās slop. We celebrate Kepler, but we should also celebrate Brahe for his assiduous data collection, which was ten times more precise than any previous observation. That extra decimal point of accuracy was essential for Kepler to get his results. He was using and the most advanced mathematics he could use at the time to match his models with the data. All aspects had to be in play: the data, the theory, and the hypothesis generation. Iām not sure nowadays that hypothesis generation is the bottleneck anymore. Science has changed in the century since. Classically, the two big paradigms for science were theory and experiment. Then in the 20th century, numerical simulation came along, so you can do computer simulations to test theories. Finally, in the late 20th century, we had big data. We had the era of data analysis. A lot of new progress is actually driven now by analyzing massive datasets first. You collect large datasets and then draw patterns from them to deduce thoughts. This is a little bit different from how science used to work, where you make a few observations or have one out-of-the-blue idea, and then collect data to test your idea. Thatās the classic scientific method. Now itās almost reversed. You collect big data first, and then you try to get hypotheses from it. Kepler was maybe one of the first early data scientists, but even he didnāt start with Tychoās dataset and then analyze it. He had some preconceived theories first. It seems like this is less and less the way we make progress, just because the data is so much more massive and useful. Oh, interesting. I feel like the 20th-century science that youāre describing actually very well describes what happened with Kepler. He did have these ideasā1595 and ā96 is where he comes up with the polygons and then the Platonic objects theoryābut they were wrong. Then a few years later, he gets Braheās data, and itās only after twenty years of trying random things that he gets this empirical regularity. It actually feels a bit closer to Braheās data being analogous to some massive data bank of simulations, and now that youāve got the data, you can keep trying random things. If it wasnāt for that, Kepler would be out there just writing books about harmonics and Platonic objects, and there would be nothing to actually verify against. The data was extremely important. The distinction I was trying to make was that traditionally, you make a hypothesis and then you test it against data. But now with machine learning, data analysis, and statistics, you can start with data and through statistics work out laws that were not present before. Keplerās third law is a little bit like this, except that instead of having the thousand data points that Brahe had, Kepler had six data points. For every planet, he knew the length of the orbit and the distance to the Sun. There were five or six data points, and he did what we would now call regression . He fit a curve to these six data points and got a square-cube law, which was amazing. But he was quite lucky that these six data points gave him the right conclusion. Thatās not enough data to be really reliable. , who took the same dataāthe distances to the planetsāand inspired by Kepler, he had a prediction that the distances to the planets formed a shifted geometric progression. He also fit a curve, except there was one point missing. There was a big gap between Mars and Jupiter. His law predicted that there was a missing planet.
From Kepler to AI Slop
- The scientific paradigm has shifted from hypothesis-driven testing to data-first analysis, where patterns are deduced from massive datasets before theories are formed.
- Keplerās success relied on the unprecedented precision of Tycho Braheās data, illustrating that verification is as essential to progress as creative idea generation.
- Artificial intelligence has reduced the cost of generating new ideas to near zero, creating a massive bottleneck in the verification and validation of these theories.
- Modern peer review systems are being overwhelmed by 'AI slop,' making it difficult to distinguish high-signal unifying concepts from numerical flukes or low-value noise.
- The future of science requires new structures to identify transformative ideas at scale, as the traditional human-led consensus process cannot keep up with automated output.
I think AI has driven the cost of idea generation down to almost zero, in a very similar way to how the internet drove the cost of communication down to almost zero.
The ones we celebrate are these eureka genius moments of idea generation. Kepler certainly had to cycle through many ideas, several of which didnāt work. I bet there were many that he didnāt even publish at all because they just didnāt fit. Thatās an important part of the process, trying all kinds of random things and seeing if they worked. But as you say, it has to be matched by an equal amount of verification, otherwise itās slop. We celebrate Kepler, but we should also celebrate Brahe for his assiduous data collection, which was ten times more precise than any previous observation. That extra decimal point of accuracy was essential for Kepler to get his results. He was using and the most advanced mathematics he could use at the time to match his models with the data. All aspects had to be in play: the data, the theory, and the hypothesis generation. Iām not sure nowadays that hypothesis generation is the bottleneck anymore. Science has changed in the century since. Classically, the two big paradigms for science were theory and experiment. Then in the 20th century, numerical simulation came along, so you can do computer simulations to test theories. Finally, in the late 20th century, we had big data. We had the era of data analysis. A lot of new progress is actually driven now by analyzing massive datasets first. You collect large datasets and then draw patterns from them to deduce thoughts. This is a little bit different from how science used to work, where you make a few observations or have one out-of-the-blue idea, and then collect data to test your idea. Thatās the classic scientific method. Now itās almost reversed. You collect big data first, and then you try to get hypotheses from it. Kepler was maybe one of the first early data scientists, but even he didnāt start with Tychoās dataset and then analyze it. He had some preconceived theories first. It seems like this is less and less the way we make progress, just because the data is so much more massive and useful. Oh, interesting. I feel like the 20th-century science that youāre describing actually very well describes what happened with Kepler. He did have these ideasā1595 and ā96 is where he comes up with the polygons and then the Platonic objects theoryābut they were wrong. Then a few years later, he gets Braheās data, and itās only after twenty years of trying random things that he gets this empirical regularity. It actually feels a bit closer to Braheās data being analogous to some massive data bank of simulations, and now that youāve got the data, you can keep trying random things. If it wasnāt for that, Kepler would be out there just writing books about harmonics and Platonic objects, and there would be nothing to actually verify against. The data was extremely important. The distinction I was trying to make was that traditionally, you make a hypothesis and then you test it against data. But now with machine learning, data analysis, and statistics, you can start with data and through statistics work out laws that were not present before. Keplerās third law is a little bit like this, except that instead of having the thousand data points that Brahe had, Kepler had six data points. For every planet, he knew the length of the orbit and the distance to the Sun. There were five or six data points, and he did what we would now call regression . He fit a curve to these six data points and got a square-cube law, which was amazing. But he was quite lucky that these six data points gave him the right conclusion. Thatās not enough data to be really reliable. , who took the same dataāthe distances to the planetsāand inspired by Kepler, he had a prediction that the distances to the planets formed a shifted geometric progression. He also fit a curve, except there was one point missing. There was a big gap between Mars and Jupiter. His law predicted that there was a missing planet.
It was kind of a crank theory, except when Uranus was discovered by , the distance to Uranus fit exactly this pattern. Then , and it also fit the pattern. People got really excited that Bode had discovered this amazing new law of nature But then Neptune was discovered, and it was way off. Basically it was just a numerical fluke. There were six data points. Maybe one reason why Kepler didnāt highlight his third law as much as the first two laws is that instinctively, even though he didnāt have modern statistics, he kind of knew that with six data points, he had to be somewhat tentative with the conclusions. 00:11:44 ā How would we know if thereās a new unifying concept within heaps of AI slop? To ask the question about the analogy more explicitly, does this analogy make sense if in the future we have smarter and smarter AIs? Weāll have millions of them, and they can go out and hunt for all these empirical irregularities. It sounds like you donāt think the bottleneck in science is finding more things that are the equivalent of the third law of planetary motion for each given field, so that later on somebody can say, āOh, we need a way to explain this. Letās work out the math. Hereās the I think AI has driven the cost of idea generation down to almost zero, in a very similar way to how the internet drove the cost of communication down to almost zero. Itās an amazing thing, but it doesnāt create abundance by itself. Now the bottleneck is different. Weāre now in a situation where suddenly people can generate thousands of theories for a given scientific problem. Now we have to verify them, evaluate them. This is something which we have to change our structures of science to actually sort this out. Traditionally, we build walls. In the past, before we had AI slop, we had amateur scientists have their own theories of the universe, many of which were of very little value. We built these peer review publication systems to filter out and try to isolate the high signal ideas to test. But now that we can generate these possible explanations at massive scale, and some of them are good and a lot are terrible, human reviewers are already being overwhelmed. Many journals are reporting that AI-generated submissions are just flooding their submissions Itās great that we can generate all kinds of things now with AI, but it means that the rest of the aspects of science have to catch up: verification, validation, and assessing what ideas actually move the subject forward and which ones are dead ends or red herrings. Thatās not something we know how to do at scale. For each individual paper, we can have a debate among scientists and get to a consensus in a few years. But when weāre generating a thousand of these every day, this doesnāt work. Thereās this incredibly interesting question. If you have billions of AI scientists, not only how do you gauge which ones are real progress, but how do you... This is actually a question that human science has had to face and weāve solved somehow, and Iām actually not sure how we solved this. Letās say in the 1940s, if youāre at and there are these new technologies coming out. , how do you transfer signals? How do you digitize signals? How do you transfer them over analog wires? There are all these papers about the engineering constraints and the details, and then thereās one which comes up with the , which has implications across many different fields. You need some system which can then look at that and say, āOkay, we need to apply this to probability. We need to apply this to computer science,ā et cetera. In the future, the AIs are coming up with the next version of this unifying concept. How would you identify it among millions of papers that might actually constitute progress, but which have much less in terms of general unifying ideas? A lot of itās the test of time. Many great ideas didnāt actually get a great reception at the time they were first proposed.
The Bottleneck of Verification
- AI has reduced the cost of idea generation to near zero, shifting the scientific bottleneck from finding theories to verifying and validating them at scale.
- The current peer review system is being overwhelmed by a flood of AI-generated submissions, making it difficult to distinguish high-signal breakthroughs from 'slop.'
- Scientific progress is often non-linear, as seen in how Copernicus's simpler heliocentric model was initially less accurate than the highly-refined but incorrect Ptolemaic system.
- The success of a scientific idea is often dependent on historical context, social inertia, and the 'test of time' rather than immediate objective grading.
- True breakthroughs frequently require deleting long-held assumptions or accepting initially implausible implications, such as the lack of observable stellar parallax in ancient Greece.
Often, the ultimately correct theory initially is worse in many ways.
It was kind of a crank theory, except when Uranus was discovered by , the distance to Uranus fit exactly this pattern. Then , and it also fit the pattern. People got really excited that Bode had discovered this amazing new law of nature But then Neptune was discovered, and it was way off. Basically it was just a numerical fluke. There were six data points. Maybe one reason why Kepler didnāt highlight his third law as much as the first two laws is that instinctively, even though he didnāt have modern statistics, he kind of knew that with six data points, he had to be somewhat tentative with the conclusions. 00:11:44 ā How would we know if thereās a new unifying concept within heaps of AI slop? To ask the question about the analogy more explicitly, does this analogy make sense if in the future we have smarter and smarter AIs? Weāll have millions of them, and they can go out and hunt for all these empirical irregularities. It sounds like you donāt think the bottleneck in science is finding more things that are the equivalent of the third law of planetary motion for each given field, so that later on somebody can say, āOh, we need a way to explain this. Letās work out the math. Hereās the I think AI has driven the cost of idea generation down to almost zero, in a very similar way to how the internet drove the cost of communication down to almost zero. Itās an amazing thing, but it doesnāt create abundance by itself. Now the bottleneck is different. Weāre now in a situation where suddenly people can generate thousands of theories for a given scientific problem. Now we have to verify them, evaluate them. This is something which we have to change our structures of science to actually sort this out. Traditionally, we build walls. In the past, before we had AI slop, we had amateur scientists have their own theories of the universe, many of which were of very little value. We built these peer review publication systems to filter out and try to isolate the high signal ideas to test. But now that we can generate these possible explanations at massive scale, and some of them are good and a lot are terrible, human reviewers are already being overwhelmed. Many journals are reporting that AI-generated submissions are just flooding their submissions Itās great that we can generate all kinds of things now with AI, but it means that the rest of the aspects of science have to catch up: verification, validation, and assessing what ideas actually move the subject forward and which ones are dead ends or red herrings. Thatās not something we know how to do at scale. For each individual paper, we can have a debate among scientists and get to a consensus in a few years. But when weāre generating a thousand of these every day, this doesnāt work. Thereās this incredibly interesting question. If you have billions of AI scientists, not only how do you gauge which ones are real progress, but how do you... This is actually a question that human science has had to face and weāve solved somehow, and Iām actually not sure how we solved this. Letās say in the 1940s, if youāre at and there are these new technologies coming out. , how do you transfer signals? How do you digitize signals? How do you transfer them over analog wires? There are all these papers about the engineering constraints and the details, and then thereās one which comes up with the , which has implications across many different fields. You need some system which can then look at that and say, āOkay, we need to apply this to probability. We need to apply this to computer science,ā et cetera. In the future, the AIs are coming up with the next version of this unifying concept. How would you identify it among millions of papers that might actually constitute progress, but which have much less in terms of general unifying ideas? A lot of itās the test of time. Many great ideas didnāt actually get a great reception at the time they were first proposed.
It was only after some other scientists realized that they could take it further and apply them to their own... itself was a niche area of AI for a long time. The idea of getting answers entirely through training on data and not through first principles reasoning was very controversial, and it just took a long time before it started bearing fruit. You mentioned the bit. There were other proposals for computer architectures than the zero-one that is universal today. I think there were , three-valued logic. In an alternate universe, maybe a different paradigm would have shown up. The , for example, is the foundation of all modern , and it was the first deep learning architecture that really was sophisticated enough to capture language. But it didnāt have to be that way. There couldāve been some other architecture that was the first to do it and once that was adopted, it would become the standard. One reason why itās hard to assess whether a given idea is going to be fruitful is that it depends on the future. It depends also on the culture and society, which ones get adopted, which ones donāt. The in mathematics is extremely useful, much better than the , for instance. But again, thereās nothing special about ten. Itās a system that is useful for us because everyone else uses it. Weāve standardized it. Weāve built all our computers and our number representation systems around it, so weāre stuck with it now. Some people occasionally push for other systems than decimal, but thereās just too much inertia. Itās not something where you can look at any given scientific achievement purely in isolation and give it an objective grade without being aware of the context both in the past and the future. So it may never be something that you can just the same way that you can for much more localized problems. Often in the history of science when a new theory comes up that in retrospect we realize is correct, it seems to make implications that either make no sense because theyāre wrong, and we realize later on why theyāre wrong, or theyāre correct but seem wildly implausible at the time. As you talked about, Aristarchus had heliocentrism in the third century BC. The ancient Athenians were like, āThis canāt be because if the earth is going around the sun, we should see the relative position of the stars change as weāre going around the sun, and the only way that wouldnāt be the case is if theyāre so far away that you donāt notice any parallax,ā which is actually the correct implication. But thereās times when the implication is incorrect and we just need to graduate to a better level of understanding. would chide Newton and disagree with , and they didnāt know the mechanism, and Newton himself was sort of stunned that inertial mass and gravitational mass were the same quantity . All these things later were resolved by So the question for a system of peer review for AI would be: even if you can falsify a theory, how would you notice that it still constitutes progress relative to the thing before? Often, the ultimately correct theory initially is worse in many ways. Copernicusās theory of the planets was less accurate than Ptolemyās theory. Geocentrism had been developed for a millennium by that point, and they had made many tweaks and increasingly complicated ad hoc fixes to make it more and more accurate. Copernicusās theory was a lot simpler but much less accurate. It was only Kepler that made it more accurate than Ptolemyās theory. Science is always a work in progress. When you only get part of the solution, it looks worse than a theory which is incorrect but somehow has been completed to the point where it kind of answers all the questions. As you say, Newtonās theory had big mysteries. They had the equivalence of mass and action at a distance, which were only resolved with a very conceptually different approach Often progress has to be made not by adding more theories, but by deleting some assumptions that you have in your mind.
The Friction of Scientific Progress
- Scientific breakthroughs are often hindered by cultural inertia and the standardization of existing paradigms, such as the decimal system or specific AI architectures.
- Correct theories frequently appear less accurate than established ones initially, as seen when Copernicus's simpler model was less precise than the highly-tweaked Ptolemaic system.
- Progress often requires deleting long-held assumptions, such as the Aristotelian belief that objects naturally seek a state of rest, rather than just adding new data.
- The success of a theory depends heavily on effective communication and narrative, as demonstrated by Darwinās plain-English synthesis versus Newtonās secretive and complex Latin texts.
When you only get part of the solution, it looks worse than a theory which is incorrect but somehow has been completed to the point where it kind of answers all the questions.
It was only after some other scientists realized that they could take it further and apply them to their own... itself was a niche area of AI for a long time. The idea of getting answers entirely through training on data and not through first principles reasoning was very controversial, and it just took a long time before it started bearing fruit. You mentioned the bit. There were other proposals for computer architectures than the zero-one that is universal today. I think there were , three-valued logic. In an alternate universe, maybe a different paradigm would have shown up. The , for example, is the foundation of all modern , and it was the first deep learning architecture that really was sophisticated enough to capture language. But it didnāt have to be that way. There couldāve been some other architecture that was the first to do it and once that was adopted, it would become the standard. One reason why itās hard to assess whether a given idea is going to be fruitful is that it depends on the future. It depends also on the culture and society, which ones get adopted, which ones donāt. The in mathematics is extremely useful, much better than the , for instance. But again, thereās nothing special about ten. Itās a system that is useful for us because everyone else uses it. Weāve standardized it. Weāve built all our computers and our number representation systems around it, so weāre stuck with it now. Some people occasionally push for other systems than decimal, but thereās just too much inertia. Itās not something where you can look at any given scientific achievement purely in isolation and give it an objective grade without being aware of the context both in the past and the future. So it may never be something that you can just the same way that you can for much more localized problems. Often in the history of science when a new theory comes up that in retrospect we realize is correct, it seems to make implications that either make no sense because theyāre wrong, and we realize later on why theyāre wrong, or theyāre correct but seem wildly implausible at the time. As you talked about, Aristarchus had heliocentrism in the third century BC. The ancient Athenians were like, āThis canāt be because if the earth is going around the sun, we should see the relative position of the stars change as weāre going around the sun, and the only way that wouldnāt be the case is if theyāre so far away that you donāt notice any parallax,ā which is actually the correct implication. But thereās times when the implication is incorrect and we just need to graduate to a better level of understanding. would chide Newton and disagree with , and they didnāt know the mechanism, and Newton himself was sort of stunned that inertial mass and gravitational mass were the same quantity . All these things later were resolved by So the question for a system of peer review for AI would be: even if you can falsify a theory, how would you notice that it still constitutes progress relative to the thing before? Often, the ultimately correct theory initially is worse in many ways. Copernicusās theory of the planets was less accurate than Ptolemyās theory. Geocentrism had been developed for a millennium by that point, and they had made many tweaks and increasingly complicated ad hoc fixes to make it more and more accurate. Copernicusās theory was a lot simpler but much less accurate. It was only Kepler that made it more accurate than Ptolemyās theory. Science is always a work in progress. When you only get part of the solution, it looks worse than a theory which is incorrect but somehow has been completed to the point where it kind of answers all the questions. As you say, Newtonās theory had big mysteries. They had the equivalence of mass and action at a distance, which were only resolved with a very conceptually different approach Often progress has to be made not by adding more theories, but by deleting some assumptions that you have in your mind.
One reason why geocentrism held on for so long is we had this idea that objects naturally want to stay at rest. This is the Aristotelian notion of physics , and so the idea that the Earth was moving⦠How come we werenāt all falling over? Once you have Newtonās laws of motionāan object in motion remains in motion and so forthāthen it makes sense. Conceptually, itās a very big leap to realize that the Earth is in motion. It doesnāt feel like itās in motion. The biggest advances, like , is the idea that species are not static. This is not obvious because you donāt see evolution in your lifetime. Well, now we actually can, but it seems permanent and static. Right now weāre going through a cognitive version of the Copernican revolution, where we used to think that human intelligence is the center of the universe, and now weāre seeing that there are very different types of intelligence out there with very different strengths and weaknesses. Our assessment of which tasks require intelligence, which ones donāt, has to be reordered quite a bit. Trying to fit AI into our theories of scientific progress and what is hard and what is easy, weāre struggling quite a lot. We have to ask questions that weāve never really had to ask before. Or maybe the philosophers had, but now we all have to deal with it. This brings up a topic Iāve been very curious about. You mentioned Darwinās theory of evolution. Thereās this book, , which covers a lot of this era of history weāre talking about. He has this interesting observation in there. . Conceptually, it seems like Darwinās theory is simpler. Thereās a contemporaneous biologist to Darwin, and he says, āHow stupid not to have thought of that.ā , chiding themselves for not having beaten Newton to . So thereās a question of why did it take longer? It seems like a big part of the reason is what you were saying. The evidence for natural selection is overwhelming in a certain sense, but itās cumulative and retrospective, whereas Newton can just say, āHere are my equations. Let me see the moonās orbital period and its distance, and if it lines up, then weāve made progress.ā actually had this idea that species adapted to their environment in the first century BC but nobody really talks about it until Darwin because Lucretius couldnāt run some experiment and force people to pay attention. I wonder if weāll in retrospect end up seeing much more progress in domains which have this kind of tight data loop where you can verify them quite easily, even though theyāre conceptually much more difficult. I think one aspect of science is that itās not just creating a new theory and validating it, but communicating it to others. Darwin was an amazing science communicator. He wrote in English, in natural language. Iām speaking like aā I have to get out of my technical mindset. He spoke in plain English, didnāt use equations, and he synthesized a lot of disparate facts. Little pieces of evolution had been worked out in the past, but he had this very compelling vision. Again, he was still missing things. He didnāt know the mechanism for , he didnāt have DNA. But his writing style was persuasive, and that helped a lot. Newton wrote in Latin. He had invented entire new areas of mathematics just to explain what he was doing. He was also from an era where scientists were much more secretive and competitive. Academia is still competitive, but it was even worse back in Newtonās day. He held back some of his best insights because he didnāt want his rivals to get any advantage. He was also a somewhat unpleasant person from what I gather. It was only a couple of decades after Newton when other scientists explained his work in much simpler terms that they became widespread. The art of exposition and making a case and creating a narrative is also a very important part of science.
The Social Narrative of Science
- Scientific progress is often delayed by conceptual leaps that defy intuition, such as the transition from Aristotelian rest to Newtonian motion or the shift from static species to evolution.
- The success of a theory depends heavily on the art of exposition and persuasion, as seen in Darwin's accessible English prose versus Newton's secretive and mathematically dense Latin texts.
- Human intelligence is currently undergoing a 'cognitive Copernican revolution' as AI challenges our traditional assessments of which tasks are difficult and which are easy.
- Astronomy serves as a model for 'squeezing every last drop' of information from limited data, a skill that could be applied to the sociology of science to measure the actual impact of research.
- The 'human side' of science involves creating narratives that account for gaps in current knowledge, a social and persuasive skill that remains difficult to quantify or automate through AI.
The art of exposition and making a case and creating a narrative is also a very important part of science.
One reason why geocentrism held on for so long is we had this idea that objects naturally want to stay at rest. This is the Aristotelian notion of physics , and so the idea that the Earth was moving⦠How come we werenāt all falling over? Once you have Newtonās laws of motionāan object in motion remains in motion and so forthāthen it makes sense. Conceptually, itās a very big leap to realize that the Earth is in motion. It doesnāt feel like itās in motion. The biggest advances, like , is the idea that species are not static. This is not obvious because you donāt see evolution in your lifetime. Well, now we actually can, but it seems permanent and static. Right now weāre going through a cognitive version of the Copernican revolution, where we used to think that human intelligence is the center of the universe, and now weāre seeing that there are very different types of intelligence out there with very different strengths and weaknesses. Our assessment of which tasks require intelligence, which ones donāt, has to be reordered quite a bit. Trying to fit AI into our theories of scientific progress and what is hard and what is easy, weāre struggling quite a lot. We have to ask questions that weāve never really had to ask before. Or maybe the philosophers had, but now we all have to deal with it. This brings up a topic Iāve been very curious about. You mentioned Darwinās theory of evolution. Thereās this book, , which covers a lot of this era of history weāre talking about. He has this interesting observation in there. . Conceptually, it seems like Darwinās theory is simpler. Thereās a contemporaneous biologist to Darwin, and he says, āHow stupid not to have thought of that.ā , chiding themselves for not having beaten Newton to . So thereās a question of why did it take longer? It seems like a big part of the reason is what you were saying. The evidence for natural selection is overwhelming in a certain sense, but itās cumulative and retrospective, whereas Newton can just say, āHere are my equations. Let me see the moonās orbital period and its distance, and if it lines up, then weāve made progress.ā actually had this idea that species adapted to their environment in the first century BC but nobody really talks about it until Darwin because Lucretius couldnāt run some experiment and force people to pay attention. I wonder if weāll in retrospect end up seeing much more progress in domains which have this kind of tight data loop where you can verify them quite easily, even though theyāre conceptually much more difficult. I think one aspect of science is that itās not just creating a new theory and validating it, but communicating it to others. Darwin was an amazing science communicator. He wrote in English, in natural language. Iām speaking like aā I have to get out of my technical mindset. He spoke in plain English, didnāt use equations, and he synthesized a lot of disparate facts. Little pieces of evolution had been worked out in the past, but he had this very compelling vision. Again, he was still missing things. He didnāt know the mechanism for , he didnāt have DNA. But his writing style was persuasive, and that helped a lot. Newton wrote in Latin. He had invented entire new areas of mathematics just to explain what he was doing. He was also from an era where scientists were much more secretive and competitive. Academia is still competitive, but it was even worse back in Newtonās day. He held back some of his best insights because he didnāt want his rivals to get any advantage. He was also a somewhat unpleasant person from what I gather. It was only a couple of decades after Newton when other scientists explained his work in much simpler terms that they became widespread. The art of exposition and making a case and creating a narrative is also a very important part of science.
If you have the data, it helps, but people need to be convinced, otherwise they will not push it further or take the initial investment to learn your theory and really explore it. Thatās another thing which is really hard to reinforcement learn on. How can you score how persuasive you are? Well, there are entire marketing departments trying to do this. Maybe itās good that AI is not yet optimized to be persuasive. Thereās a social aspect to science. Even though we pride ourselves on having an objective side to it, where thereās data and experiment and validation, we still have to tell stories and convince our fellow scientists. Thatās a soft, squishy thing. Itās a combination of data and painting a narrative, and itās a narrative of gaps. Even with Darwin, as I said, there were pieces of his theory he could not explain. But he could still make a case that in the future, people would find transitional forms, that they would find the mechanism of inheritance, and they did. I donāt know how you can quantify that in such a precise way that you can start doing reinforcement learning. Maybe that will be forever the human side of science. 00:26:10 ā The deductive overhang One takeaway I had from reading and watching your stuff on the ⦠By the way, I highly recommend people watch on the cosmic distance ladder. One takeaway was that the deductive overhang in many fields could be so much bigger than people realize. If you just had the right insight about how to study a problem, you might be surprised at how much more you could learn about the world. I wonder if you think thatās a product of astronomy at the particular times in history that youāre studying. Or is it just that based on the data that is incident on the Earth right now, we could actually divine a lot more than we happen to know? Astronomy was one of the first sciences to really embrace data analysis and squeezing every last possible drop of information out of the information they had because data was the bottleneck. It still is the bottleneck. Itās really hard to collect astronomical data. Astronomers are world-class in extracting all kinds of conclusions from little traces of data, almost like Sherlock. I hear that for a lot of quant hedge funds, their preferred hire is an astronomy PhD, actually. They are also very interested for other reasons in extracting signals from various random bits of data. We do under-explore how to extract extra information from various signals. Just to pick , I remember reading once that people were trying to measure how often scientists actually read the papers that they cite. How do you measure this? You could try to survey different scientists, but they had a clever trick. Many citations have little typos, like a number is wrong or punctuation is almost wrong. They measured how often a typo got copied from one reference to the next, and they could infer whether an author was just copying and pasting a reference without actually checking it. From that, they were able to infer some measure of how much attention people were paying. So there are some clever tricks to extract⦠These questions you posed earlier of how we can assess whether a scientific development is fruitful, interesting, or represents real progress⦠Maybe there are really useful metrics or footprints of this phenomenon in data. We can examine citations and how often something is mentioned in a conference. Maybe thereās a lot of sociology of science research to be done that could actually detect these things. Maybe we should get some astronomers on the case, actually. 00:30:31 ā Selection bias in reported AI discoveries That brings us nicely to the progress that, from the outside, it seems like AI for math is making. You had a post recently where you pointed out that over the last few months, AI programs have solved fifty out of the eleven hundred odd .
Persuasion and the Deductive Overhang
- Science involves a social and narrative component where researchers must convince peers of a theory's future potential even when data gaps exist.
- Astronomy serves as a model for 'squeezing' maximum information out of sparse data, a skill highly valued in quantitative fields like hedge funds.
- The 'deductive overhang' suggests that significant discoveries can be made by applying clever insights to existing data rather than just collecting more.
- Current AI progress in mathematics has hit a plateau after solving 'low-hanging fruit' because models struggle to evaluate or create partial progress.
- While AI currently lacks the ability to navigate complex, multi-stage problems, its potential to scale across all problems at a specific difficulty level remains a powerful advantage.
These AI tools, theyāre like jumping machines that can jump two meters in the air, higher than any human. Sometimes they jump in the wrong direction, and sometimes they crash, but sometimes they can reach the tops of the lowest walls that we couldnāt reach before.
If you have the data, it helps, but people need to be convinced, otherwise they will not push it further or take the initial investment to learn your theory and really explore it. Thatās another thing which is really hard to reinforcement learn on. How can you score how persuasive you are? Well, there are entire marketing departments trying to do this. Maybe itās good that AI is not yet optimized to be persuasive. Thereās a social aspect to science. Even though we pride ourselves on having an objective side to it, where thereās data and experiment and validation, we still have to tell stories and convince our fellow scientists. Thatās a soft, squishy thing. Itās a combination of data and painting a narrative, and itās a narrative of gaps. Even with Darwin, as I said, there were pieces of his theory he could not explain. But he could still make a case that in the future, people would find transitional forms, that they would find the mechanism of inheritance, and they did. I donāt know how you can quantify that in such a precise way that you can start doing reinforcement learning. Maybe that will be forever the human side of science. 00:26:10 ā The deductive overhang One takeaway I had from reading and watching your stuff on the ⦠By the way, I highly recommend people watch on the cosmic distance ladder. One takeaway was that the deductive overhang in many fields could be so much bigger than people realize. If you just had the right insight about how to study a problem, you might be surprised at how much more you could learn about the world. I wonder if you think thatās a product of astronomy at the particular times in history that youāre studying. Or is it just that based on the data that is incident on the Earth right now, we could actually divine a lot more than we happen to know? Astronomy was one of the first sciences to really embrace data analysis and squeezing every last possible drop of information out of the information they had because data was the bottleneck. It still is the bottleneck. Itās really hard to collect astronomical data. Astronomers are world-class in extracting all kinds of conclusions from little traces of data, almost like Sherlock. I hear that for a lot of quant hedge funds, their preferred hire is an astronomy PhD, actually. They are also very interested for other reasons in extracting signals from various random bits of data. We do under-explore how to extract extra information from various signals. Just to pick , I remember reading once that people were trying to measure how often scientists actually read the papers that they cite. How do you measure this? You could try to survey different scientists, but they had a clever trick. Many citations have little typos, like a number is wrong or punctuation is almost wrong. They measured how often a typo got copied from one reference to the next, and they could infer whether an author was just copying and pasting a reference without actually checking it. From that, they were able to infer some measure of how much attention people were paying. So there are some clever tricks to extract⦠These questions you posed earlier of how we can assess whether a scientific development is fruitful, interesting, or represents real progress⦠Maybe there are really useful metrics or footprints of this phenomenon in data. We can examine citations and how often something is mentioned in a conference. Maybe thereās a lot of sociology of science research to be done that could actually detect these things. Maybe we should get some astronomers on the case, actually. 00:30:31 ā Selection bias in reported AI discoveries That brings us nicely to the progress that, from the outside, it seems like AI for math is making. You had a post recently where you pointed out that over the last few months, AI programs have solved fifty out of the eleven hundred odd .
I donāt know if itās still correct, but as of a month ago you said that there had been a pause because the low-hanging fruit had been picked. First of all, Iām curious if that is still the case, that we have picked the low-hanging fruit and now weāre at this plateau currently. It does seem so. Fifty-odd problems have been solved with AI assistance, which is great, but thereās like six hundred to go. People are still chipping away at one or two of these right now. Weāre seeing a lot fewer pure AI solutions now where the AI just one-shots the problem. There was a month where that happened and that has stopped, not for lack of trying. I know of three separate attempts to get frontier model AIs to just attack every single one of the problems simultaneously. They pick out some minor observations, or maybe they find that some problem was already solved in the literature, but there hasnāt been any further purely AI-powered solution yet. People are using AI a lot currently. Someone might use AI to generate a possible proof strategy, and then another person will use a separate AI tool to critique it, rewrite it, generate some numerical data for it, or do a literature survey. Some problems have been solved by an ongoing conversation between lots of humans and lots of AI tools. But it does seem like it was this one-off thing. Maybe one analogy for these problems is that youāre in some sort of mountain range with all kinds of cliffs and walls. Maybe thereās a little wall which is three feet high, and one thatās six feet high, and then thereās fifteen feet high, and then there are some mile-high cliffs. Youāre trying to climb as many of these cliffs as possible, but itās in the dark. We donāt know which ones are tall, which ones are short. So we try to light some candles and make some maps, and slowly we figure out some of them are climbable. Some of them we can identify a partial track in the wall that you can reach first. These AI tools, theyāre like jumping machines that can jump two meters in the air, higher than any human. Sometimes they jump in the wrong direction, and sometimes they crash, but sometimes they can reach the tops of the lowest walls that we couldnāt reach before. Weāve just set them loose in this mountain range, hopping around. There was this exciting period where they could actually find all the low ones and reach them. Maybe the next time thereās a big advance in the models, they will try it again, and a few more will be breached. But itās a different style of doing mathematics. Normally we would , make little markers, and try to identify partial things. These tools either succeed or they fail. Theyāve been really bad at creating partial progress or identifying intermediate stages that you should focus on first. Going back to this previous discussion, we donāt have a way of evaluating partial progress the same way we can evaluate a one-shot success or failure of solving a problem. Thereās two different ways to think through what youāve just said. One of them is more bearish on AI progress, and one of them is more bullish. The bearish one being, āOh, theyāre only getting to a certain height of wall, which is not as high as humans are reaching.ā The second is that they have this powerful property that once they achieve a certain waterline, they can fill every single problem that is available at that waterline, which we simply canāt do with humans. We canāt make a million copies of you and give each of them a million dollars of inference compute and have you do a hundred years of subjective time research on a million different problems at the same time. But once AIs reach Terence Tao-level, they could do that. Once they reach intermediate levels, they could do the intermediate version of that. The same reason that we should be bearish now is the reason we should be especially bullish.
Breadth vs Depth in AI Science
- AI models have hit a plateau in mathematics after solving 'low-hanging fruit' problems, shifting the focus from one-shot solutions to collaborative human-AI workflows.
- Current AI tools excel at breadth rather than depth, acting like 'jumping machines' that can reach many low-level targets simultaneously but struggle with high-climbing deep reasoning.
- The scientific paradigm may need to shift from focusing on a few deep problems to managing broad classes of problems that leverage AI's ability to map entire fields at once.
- There is a risk that offloading the 'process' of problem-solving to AI could inhibit the development of human intuition and the ability to maintain complex systems over time.
- Mathematics is unique among sciences for its heavy reliance on theory, making the 'process' of discovery often more valuable than the final answer itself.
These AI tools, theyāre like jumping machines that can jump two meters in the air, higher than any human.
I donāt know if itās still correct, but as of a month ago you said that there had been a pause because the low-hanging fruit had been picked. First of all, Iām curious if that is still the case, that we have picked the low-hanging fruit and now weāre at this plateau currently. It does seem so. Fifty-odd problems have been solved with AI assistance, which is great, but thereās like six hundred to go. People are still chipping away at one or two of these right now. Weāre seeing a lot fewer pure AI solutions now where the AI just one-shots the problem. There was a month where that happened and that has stopped, not for lack of trying. I know of three separate attempts to get frontier model AIs to just attack every single one of the problems simultaneously. They pick out some minor observations, or maybe they find that some problem was already solved in the literature, but there hasnāt been any further purely AI-powered solution yet. People are using AI a lot currently. Someone might use AI to generate a possible proof strategy, and then another person will use a separate AI tool to critique it, rewrite it, generate some numerical data for it, or do a literature survey. Some problems have been solved by an ongoing conversation between lots of humans and lots of AI tools. But it does seem like it was this one-off thing. Maybe one analogy for these problems is that youāre in some sort of mountain range with all kinds of cliffs and walls. Maybe thereās a little wall which is three feet high, and one thatās six feet high, and then thereās fifteen feet high, and then there are some mile-high cliffs. Youāre trying to climb as many of these cliffs as possible, but itās in the dark. We donāt know which ones are tall, which ones are short. So we try to light some candles and make some maps, and slowly we figure out some of them are climbable. Some of them we can identify a partial track in the wall that you can reach first. These AI tools, theyāre like jumping machines that can jump two meters in the air, higher than any human. Sometimes they jump in the wrong direction, and sometimes they crash, but sometimes they can reach the tops of the lowest walls that we couldnāt reach before. Weāve just set them loose in this mountain range, hopping around. There was this exciting period where they could actually find all the low ones and reach them. Maybe the next time thereās a big advance in the models, they will try it again, and a few more will be breached. But itās a different style of doing mathematics. Normally we would , make little markers, and try to identify partial things. These tools either succeed or they fail. Theyāve been really bad at creating partial progress or identifying intermediate stages that you should focus on first. Going back to this previous discussion, we donāt have a way of evaluating partial progress the same way we can evaluate a one-shot success or failure of solving a problem. Thereās two different ways to think through what youāve just said. One of them is more bearish on AI progress, and one of them is more bullish. The bearish one being, āOh, theyāre only getting to a certain height of wall, which is not as high as humans are reaching.ā The second is that they have this powerful property that once they achieve a certain waterline, they can fill every single problem that is available at that waterline, which we simply canāt do with humans. We canāt make a million copies of you and give each of them a million dollars of inference compute and have you do a hundred years of subjective time research on a million different problems at the same time. But once AIs reach Terence Tao-level, they could do that. Once they reach intermediate levels, they could do the intermediate version of that. The same reason that we should be bearish now is the reason we should be especially bullish.
Not even when they achieve superhuman intelligence, but just when they achieve human-level intelligence, because their human-level intelligence is qualitatively wider and more powerful than our human-level intelligence. I agree. They excel at breadth, and humans excel at depth, human experts at least. I think theyāre very complementary. But our current way of doing math and science is focused on depth because thatās where human expertise is, because humans canāt do breadth. We have to redesign the way we do science to take full advantage of this breadth capability that we now have. We should have a lot more effort in creating very broad classes of problems to work on rather than one or two really deep, important problems. We should still have the deep, important problems, and humans should still be working on them. But now we have this other way of doing science. We can explore entirely new fields of science by first getting these broad, moderately competent AIs to map it out and make all the easy observations. And then identify certain islands of difficulty, which human experts can then come and work on. I see very much a future of very complementary science. Eventually, you would hope to get both breadth and depth and somehow get the best of both worlds. But we need practice with the breadth side. Itās too new. We donāt even have the paradigms to really take full advantage of it. But we will, and then science will be unrecognizable after that, I think. To this point about complementarity, programmers have noticed that theyāre way more productive as a result of these AI tools. I donāt know if you as a mathematician feel the same way, but it does seem like one big difference between vibe coding and vibe researching is that with software, the whole point is to have some effect on the world through your work. If it leads to you better understanding a problem or coming up with some clean abstraction to embody in your code, that is instrumental to the end goal. Whereas with research, the reason we care about solving the is that presumably that in the process of solving them, we discover new mathematical objects or new techniques that advance our civilizationās understanding of mathematics. So the proof is instrumental to the intermediate work. I donāt know if you agree with that dichotomy or if that in any way will explain the relative uplift weāll see in software versus research. Certainly in math, the process is often more important than the problem itself. The problem is kind of a proxy for measuring progress. I think even in software, there are different types of software tasks. If you just create a webpage that does the same thing that a thousand other webpages do, thereās no skill to be learned. Well, there is still some skill maybe that the individual programmer could pick up. But for boilerplate-type code, itās something that you should definitely offload to AI. Sometimes once you make the code, you still have to maintain it. There are issues with upgrading it and making it compatible with other things. Iāve heard programmers report that even if an AI can create the first prototype of a tool, making it mesh with everything else and making it interact with the real world in the way they want is an ongoing process. If you donāt have the skills that you pick up from writing the code, that may impact your ability to maintain it down the road. So yes, certainly mathematicians, weāve used problems to build intuition and to train people to have a good idea of whatās true, what to expect, what is provable, and what is difficult. Just getting the answers right away may actually inhibit that process. I made a distinction between theory and experiment before. In most sciences, thereās an equal division between the theoretical side and the experimental side. Math has been unique in that itās almost entirely theoretical. We place a premium on trying to have coherent, clean theories of why things are true and false.
Breadth, Depth, and Scalable Science
- AI excels at breadth while humans excel at depth, necessitating a redesign of scientific paradigms to leverage broad, moderately competent AI mapping.
- The introduction of AI tools may revolutionize mathematics by enabling an experimental, large-scale approach to problem-solving that was previously impossible.
- While AI can efficiently apply existing techniques to solve 'neglected' problems, it currently struggles with the creative leaps required for the most resistant 20% of a problem.
- There is a risk that offloading the process of solving problems to AI may inhibit the development of human intuition and the ability to maintain complex systems.
- The future of science likely involves a complementary model where AI identifies 'islands of difficulty' for human experts to focus their specialized skills on.
We can explore entirely new fields of science by first getting these broad, moderately competent AIs to map it out and make all the easy observations.
Not even when they achieve superhuman intelligence, but just when they achieve human-level intelligence, because their human-level intelligence is qualitatively wider and more powerful than our human-level intelligence. I agree. They excel at breadth, and humans excel at depth, human experts at least. I think theyāre very complementary. But our current way of doing math and science is focused on depth because thatās where human expertise is, because humans canāt do breadth. We have to redesign the way we do science to take full advantage of this breadth capability that we now have. We should have a lot more effort in creating very broad classes of problems to work on rather than one or two really deep, important problems. We should still have the deep, important problems, and humans should still be working on them. But now we have this other way of doing science. We can explore entirely new fields of science by first getting these broad, moderately competent AIs to map it out and make all the easy observations. And then identify certain islands of difficulty, which human experts can then come and work on. I see very much a future of very complementary science. Eventually, you would hope to get both breadth and depth and somehow get the best of both worlds. But we need practice with the breadth side. Itās too new. We donāt even have the paradigms to really take full advantage of it. But we will, and then science will be unrecognizable after that, I think. To this point about complementarity, programmers have noticed that theyāre way more productive as a result of these AI tools. I donāt know if you as a mathematician feel the same way, but it does seem like one big difference between vibe coding and vibe researching is that with software, the whole point is to have some effect on the world through your work. If it leads to you better understanding a problem or coming up with some clean abstraction to embody in your code, that is instrumental to the end goal. Whereas with research, the reason we care about solving the is that presumably that in the process of solving them, we discover new mathematical objects or new techniques that advance our civilizationās understanding of mathematics. So the proof is instrumental to the intermediate work. I donāt know if you agree with that dichotomy or if that in any way will explain the relative uplift weāll see in software versus research. Certainly in math, the process is often more important than the problem itself. The problem is kind of a proxy for measuring progress. I think even in software, there are different types of software tasks. If you just create a webpage that does the same thing that a thousand other webpages do, thereās no skill to be learned. Well, there is still some skill maybe that the individual programmer could pick up. But for boilerplate-type code, itās something that you should definitely offload to AI. Sometimes once you make the code, you still have to maintain it. There are issues with upgrading it and making it compatible with other things. Iāve heard programmers report that even if an AI can create the first prototype of a tool, making it mesh with everything else and making it interact with the real world in the way they want is an ongoing process. If you donāt have the skills that you pick up from writing the code, that may impact your ability to maintain it down the road. So yes, certainly mathematicians, weāve used problems to build intuition and to train people to have a good idea of whatās true, what to expect, what is provable, and what is difficult. Just getting the answers right away may actually inhibit that process. I made a distinction between theory and experiment before. In most sciences, thereās an equal division between the theoretical side and the experimental side. Math has been unique in that itās almost entirely theoretical. We place a premium on trying to have coherent, clean theories of why things are true and false.
We havenāt done many experiments as to, if we have two different ways to solve a problem, which is more effective. We have some intuition, but we havenāt done large-scale studies where we take a thousand problems and just test them. But we can do that now. I think AI-type tools will actually revolutionize the experimental side of math, where you donāt care so much about individual problems and the process of solving them, but you want to gather large-scale data about what things work and what things donāt. The same way that if youāre a software company and you want to roll out a thousand pieces of software, you donāt really want to handcraft each one and learn lessons from each. You just want to find what workflows let you scale. The idea of doing mathematics at scale is at its infancy. But thatās where AI is really going to revolutionize the subject. I feel like a big crux in these conversations about how good AI will be for science is, I think you said this, that theyāre using existing techniques and modifying them. It would be interesting to understand how much progress one can make simply from using existing techniques. If I looked at the top math journals, how many of the papers are coming up with a new technique, whatever that means, versus using existing techniques on new problems? What is the overhang? If you just applied every known technique to every open problem, would that constitute a humongous uplift in our civilizationās knowledge, or would that not be that impressive and useful? This is a great question, and we donāt have the data to fully answer it yet. Certainly, a lot of work that human mathematicians do⦠When you take a new problem, one of the first things we do is we look at all the standard things that have worked on similar problems in the past, and we try them one by one. Sometimes that works, and thatās still worth publishing because the question was important. Sometimes they almost work, and you have to add one more wrinkle to it, and thatās also interesting. But the papers that go into the top journals are usually ones where the existing methods can kind of solve 80% of the problem, but then there is this 20% which is resistant and a new technique has to be invented to fill in the gaps. Itās very rare now that a problem gets solved with no reliance on past literature, where all the ideas come out of nowhere. That was more common in the past, but math is so mature now that itās just so much of a handicap to not use the literature first. AI tools are getting really good at the first part of that, just trying all the standard techniques on a problem, often making fewer mistakes in applying them than humans. They still make mistakes, but Iāve tested these tools on little tasks that I can do, and sometimes they pick up errors that I make. Sometimes I pick up errors that they make. Itās about a tie right now. But I havenāt yet seen them take the next step. When there are holes in the argument where none of the things are working, then what do you do? They can suggest random things, but often I find that trying to chase them down to make them work, and finding they donāt work, wastes more time than it saves. I think some fraction of problems that we currently think are hard will fall from this method, especially the ones that havenāt received enough attention. With the problems, almost all of the 50 problems that were solved by AIs were ones for which there was basically no literature. ErdÅs posed the problem once or twice. Maybe some people tried it casually and couldnāt do it, but they never wrote up anything. But it turned out that there was a solution, and it was just combining this one obscure technique that not many people know about with some other result in the literature. Thatās the median level of what AI can accomplish, and thatās really great. It clears out 50 of these problems. So I think you will see some isolated successes.
Mathematics at Scale
- AI is revolutionizing mathematics by enabling 'math at scale,' allowing researchers to test existing techniques across thousands of problems rather than handcrafting individual solutions.
- While AI excels at applying standard techniques to solve the first 80% of a problem, it still struggles with the final 20% that requires the invention of entirely new methods.
- The perceived success of AI in math is often skewed by survivorship bias, where isolated wins on obscure problems mask a low overall success rate of approximately 1% to 2%.
- AI tools are currently functioning as high-powered assistants that handle auxiliary tasks like coding, formatting, and literature searches, enriching papers without yet replacing the core human act of solving the hardest conceptual gaps.
The progress is simultaneously amazing and disappointing. It is a very strange feeling to see these tools in action.
We havenāt done many experiments as to, if we have two different ways to solve a problem, which is more effective. We have some intuition, but we havenāt done large-scale studies where we take a thousand problems and just test them. But we can do that now. I think AI-type tools will actually revolutionize the experimental side of math, where you donāt care so much about individual problems and the process of solving them, but you want to gather large-scale data about what things work and what things donāt. The same way that if youāre a software company and you want to roll out a thousand pieces of software, you donāt really want to handcraft each one and learn lessons from each. You just want to find what workflows let you scale. The idea of doing mathematics at scale is at its infancy. But thatās where AI is really going to revolutionize the subject. I feel like a big crux in these conversations about how good AI will be for science is, I think you said this, that theyāre using existing techniques and modifying them. It would be interesting to understand how much progress one can make simply from using existing techniques. If I looked at the top math journals, how many of the papers are coming up with a new technique, whatever that means, versus using existing techniques on new problems? What is the overhang? If you just applied every known technique to every open problem, would that constitute a humongous uplift in our civilizationās knowledge, or would that not be that impressive and useful? This is a great question, and we donāt have the data to fully answer it yet. Certainly, a lot of work that human mathematicians do⦠When you take a new problem, one of the first things we do is we look at all the standard things that have worked on similar problems in the past, and we try them one by one. Sometimes that works, and thatās still worth publishing because the question was important. Sometimes they almost work, and you have to add one more wrinkle to it, and thatās also interesting. But the papers that go into the top journals are usually ones where the existing methods can kind of solve 80% of the problem, but then there is this 20% which is resistant and a new technique has to be invented to fill in the gaps. Itās very rare now that a problem gets solved with no reliance on past literature, where all the ideas come out of nowhere. That was more common in the past, but math is so mature now that itās just so much of a handicap to not use the literature first. AI tools are getting really good at the first part of that, just trying all the standard techniques on a problem, often making fewer mistakes in applying them than humans. They still make mistakes, but Iāve tested these tools on little tasks that I can do, and sometimes they pick up errors that I make. Sometimes I pick up errors that they make. Itās about a tie right now. But I havenāt yet seen them take the next step. When there are holes in the argument where none of the things are working, then what do you do? They can suggest random things, but often I find that trying to chase them down to make them work, and finding they donāt work, wastes more time than it saves. I think some fraction of problems that we currently think are hard will fall from this method, especially the ones that havenāt received enough attention. With the problems, almost all of the 50 problems that were solved by AIs were ones for which there was basically no literature. ErdÅs posed the problem once or twice. Maybe some people tried it casually and couldnāt do it, but they never wrote up anything. But it turned out that there was a solution, and it was just combining this one obscure technique that not many people know about with some other result in the literature. Thatās the median level of what AI can accomplish, and thatās really great. It clears out 50 of these problems. So I think you will see some isolated successes.
But what we found⦠Some people have done large-scale sweeps of these ErdÅs problems. If you only focus on the success stories, the ones that get broadcast on social media, it looks amazing. All these problems that havenāt been solved for decades, now theyāre falling. But whenever we do a systematic study, on any given problem an AI tool has a success rate of maybe 1% or 2%. Itās just that they can buy scale, and you just pick the winners. It looks great. I think thereāll be a similar thing happening with the hundreds of really prestigious, difficult math problems out there. Some AI may get lucky and actually solve them, and there will be some backdoor to solve the problem that everyone else missed. That will get a lot of publicity. But then people will try these fancy tools on their own favorite problem, and they will again experience the 1% to 2% success rate. Thereāll be a lot of noise amongst the signal of when theyāre working and when theyāre not. It will be increasingly important to collect these really standardized datasets. There are efforts now to create a standard set of challenge problems for AIs to solve, and not just rely on the AI companies to only publish their wins and not disclose their negative results. That will maybe give more clarity as to where weāre actually at. Although I think itās worth emphasizing how much progress in AI it constitutes already, to have models that are capable of applying some technique that nobody had written down as applicable to this particular problem. The progress is simultaneously amazing and disappointing. It is a very strange feeling to see these tools in action. But people also acclimatize really quickly. I remember when Googleās web search came out 20 years ago. It just blew all the other searches out of the water. Youāre getting relevant hits on the front page, exactly what you wanted. It was amazing, and then after a few years, you just took for granted that you could Google anything. 2026-level AI would be stunning in 2021. A lot of itāface recognition, natural speech, doing college-level math problemsāwe just take for granted now. 00:46:43 ā AI makes papers richer and broader, but not deeper that by 2026 it would be like a colleague in mathematics? A trustworthy co-author if used correctly. Which is looking pretty good in retrospect. So letās see if you can continue this streak. You personally are 2x more productive as a result of AI. What year would you say that? Productivity, I think, is not quite a one-dimensional quantity. Iām definitely noticing that the style in which I do mathematics is changing quite a bit, and the type of things I do. For example, my papers now have a lot more code, a lot more pictures, because itās so easy to generate these things now. Some plot which would have taken me hours to do, now I can do in minutes. But in the past, I just wouldnāt have put the plot in my paper in the first place. I would just talk about it in words. So itās hard to measure what 2x means. On the one hand, I think the type of papers that I would write today, if I had to do them without AI assistance, would definitely take five times longer. But I would not write my papers that way. Yeah, but these are auxiliary tasks. Things like doing a much deeper literature search or supplying a lot more numerics. They enrich the paper. The core of what I do, actually solving the most difficult part of a math problem, hasnāt changed too much. I still use pen and paper for that. But thereās lots of silly things. I use an AI agent now to reformat. Sometimes if all my parentheses are not quite the right size, I used to manually change them by hand, and now I can get an AI agent to do all that quite nicely in the background. Theyāve really sped up lots of secondary tasks. They havenāt yet sped up the core thing that I do, but itās allowed me to add more things to my papers.
AI Success and Mathematical Depth
- Current AI success in mathematics often relies on scale and 'picking winners' from a low success rate of 1% to 2%, creating a public perception of mastery that masks frequent failures.
- AI tools are significantly increasing researcher productivity in auxiliary tasks like coding, formatting, and literature searches, making papers broader but not necessarily deeper.
- A critical gap remains in AI's inability to build cumulative, interactive progress; it relies on brute-force trial and error rather than the adaptive, step-by-step strategy used by human collaborators.
- There is a concern that AI might solve prestigious problems through uninsightful brute force or 'assembly code gobbledygook' rather than by creating the elegant new conceptual frameworks humans prize.
- The progress of AI is described as simultaneously amazing and disappointing, as users quickly acclimatize to revolutionary tools and begin to take their capabilities for granted.
The progress is simultaneously amazing and disappointing. It is a very strange feeling to see these tools in action.
But what we found⦠Some people have done large-scale sweeps of these ErdÅs problems. If you only focus on the success stories, the ones that get broadcast on social media, it looks amazing. All these problems that havenāt been solved for decades, now theyāre falling. But whenever we do a systematic study, on any given problem an AI tool has a success rate of maybe 1% or 2%. Itās just that they can buy scale, and you just pick the winners. It looks great. I think thereāll be a similar thing happening with the hundreds of really prestigious, difficult math problems out there. Some AI may get lucky and actually solve them, and there will be some backdoor to solve the problem that everyone else missed. That will get a lot of publicity. But then people will try these fancy tools on their own favorite problem, and they will again experience the 1% to 2% success rate. Thereāll be a lot of noise amongst the signal of when theyāre working and when theyāre not. It will be increasingly important to collect these really standardized datasets. There are efforts now to create a standard set of challenge problems for AIs to solve, and not just rely on the AI companies to only publish their wins and not disclose their negative results. That will maybe give more clarity as to where weāre actually at. Although I think itās worth emphasizing how much progress in AI it constitutes already, to have models that are capable of applying some technique that nobody had written down as applicable to this particular problem. The progress is simultaneously amazing and disappointing. It is a very strange feeling to see these tools in action. But people also acclimatize really quickly. I remember when Googleās web search came out 20 years ago. It just blew all the other searches out of the water. Youāre getting relevant hits on the front page, exactly what you wanted. It was amazing, and then after a few years, you just took for granted that you could Google anything. 2026-level AI would be stunning in 2021. A lot of itāface recognition, natural speech, doing college-level math problemsāwe just take for granted now. 00:46:43 ā AI makes papers richer and broader, but not deeper that by 2026 it would be like a colleague in mathematics? A trustworthy co-author if used correctly. Which is looking pretty good in retrospect. So letās see if you can continue this streak. You personally are 2x more productive as a result of AI. What year would you say that? Productivity, I think, is not quite a one-dimensional quantity. Iām definitely noticing that the style in which I do mathematics is changing quite a bit, and the type of things I do. For example, my papers now have a lot more code, a lot more pictures, because itās so easy to generate these things now. Some plot which would have taken me hours to do, now I can do in minutes. But in the past, I just wouldnāt have put the plot in my paper in the first place. I would just talk about it in words. So itās hard to measure what 2x means. On the one hand, I think the type of papers that I would write today, if I had to do them without AI assistance, would definitely take five times longer. But I would not write my papers that way. Yeah, but these are auxiliary tasks. Things like doing a much deeper literature search or supplying a lot more numerics. They enrich the paper. The core of what I do, actually solving the most difficult part of a math problem, hasnāt changed too much. I still use pen and paper for that. But thereās lots of silly things. I use an AI agent now to reformat. Sometimes if all my parentheses are not quite the right size, I used to manually change them by hand, and now I can get an AI agent to do all that quite nicely in the background. Theyāve really sped up lots of secondary tasks. They havenāt yet sped up the core thing that I do, but itās allowed me to add more things to my papers.
By the same token, if I were to write a paper I wrote in 2020 againāand not add all these extra features, but just have something of the same level of functionalityāit actually hasnāt saved that much time, to be honest. Itās made the papers richer and broader, but not necessarily deeper. distinction between artificial cleverness and artificial intelligence . I would like to better understand those concepts. What is an example of intelligence that is not just cleverness? is famously hard to define. Itās one of these things that you know when you see it. But when I talk to someone and weāre trying to collaboratively solve a math problem together, thereās this conversation where neither of us knows how to solve the problem initially. One of us has some idea and it looks promising, so then we have some sort of prototype strategy. We test it, and it doesnāt work, but then we modify it. Thereās adaptivity and continual improvement of the idea over time. Eventually, weāve systematically mapped out what doesnāt work and what does work, and we can see a path forward, but itās evolving with our discussion. This isnāt quite what the AIs do. The AIs can mimic this a little bit. To go back to this analogy of these jumping robots, they can jump and fail, and jump and fail. But what they canāt do is jump a little bit, reach some handhold, stay there, pull other people up, and then try to jump from there. There isnāt this cumulative process which is built up interactively. It seems to be a lot more trial and error and just repetition: brute force. It scales, and it can work amazingly well in certain contexts. But this idea of building up cumulatively from partial progress is whatās still not quite there yet. Interesting. Youāre saying if Gemini 3 or Claude 4.5, whatever, solves a problem, it is not the case that its own understanding of math has progressed. Or even if it works on a problem without solving it, itās not that its own understanding of math has progressed. Yeah. You run a new session and itās forgotten what it just did. It has no new skills to build on related problems. Maybe what you just did is 0.001% of the training data for the next generation. So maybe eventually some of it gets absorbed. 00:53:00 ā If AI solves a problem, can humans get understanding out of it? One big question I have is how plausible is it that if we just keep training AIsāthey get better and better at solving problems in āthat they will continue to solve more and more impressive problems, and then we will be surprised at how little insight we got from some Lean solution to proving the Or do you think it is a necessary condition of solving the Riemann hypothesis, even by an AI that is doing it entirely in Lean, that the constructions and definitions created in the Lean program have to advance our understanding of mathematics? Or could it just be assembly code gobbledygook? We donāt know. Some problems have been basically solved by pure brute force. The is a famous example. We have still not found a conceptually elegant proof of this theorem, and maybe we never will. Some problems may only be solvable by splitting into an enormous number of cases and doing brute force, uninsightful computer analysis on each case. Part of the reason we prize problems like the Riemann hypothesis is that weāre pretty sure a new type of mathematics has to be created, or a new connection between two previously unconnected areas of mathematics has to be discovered to make this work. We donāt even know what the shape of the solution is, but it doesnāt feel like a problem that will be solved just by exhaustively checking cases. Or it could be false actually. Okay, there is an unlikely scenario that the hypothesis is false, and you can just compute a zero off the line, and a massive computer calculation verifies it. That would be very disappointing. I do feel that fully autonomous, one-shot approaches are not the right approach for these problems.
Intelligence, Collaboration, and Formal Proofs
- The author distinguishes between artificial cleverness and true intelligence, noting that current AIs lack the cumulative, interactive adaptivity found in human collaborative problem-solving.
- AI models currently lack persistent understanding, as they do not retain skills or progress between sessions, treating each problem as a fresh brute-force task.
- There is a concern that AI might solve major mathematical mysteries like the Riemann hypothesis using 'gobbledygook' code that offers no conceptual insight to humans.
- Formalizing proofs in languages like Lean allows mathematicians to atomically study and 'ablate' complex arguments to identify the truly innovative steps within a sea of boilerplate.
- The future of mathematics may involve new professions dedicated to refactoring and post-processing AI-generated proofs to make them elegant and understandable for humans.
But what they canāt do is jump a little bit, reach some handhold, stay there, pull other people up, and then try to jump from there.
By the same token, if I were to write a paper I wrote in 2020 againāand not add all these extra features, but just have something of the same level of functionalityāit actually hasnāt saved that much time, to be honest. Itās made the papers richer and broader, but not necessarily deeper. distinction between artificial cleverness and artificial intelligence . I would like to better understand those concepts. What is an example of intelligence that is not just cleverness? is famously hard to define. Itās one of these things that you know when you see it. But when I talk to someone and weāre trying to collaboratively solve a math problem together, thereās this conversation where neither of us knows how to solve the problem initially. One of us has some idea and it looks promising, so then we have some sort of prototype strategy. We test it, and it doesnāt work, but then we modify it. Thereās adaptivity and continual improvement of the idea over time. Eventually, weāve systematically mapped out what doesnāt work and what does work, and we can see a path forward, but itās evolving with our discussion. This isnāt quite what the AIs do. The AIs can mimic this a little bit. To go back to this analogy of these jumping robots, they can jump and fail, and jump and fail. But what they canāt do is jump a little bit, reach some handhold, stay there, pull other people up, and then try to jump from there. There isnāt this cumulative process which is built up interactively. It seems to be a lot more trial and error and just repetition: brute force. It scales, and it can work amazingly well in certain contexts. But this idea of building up cumulatively from partial progress is whatās still not quite there yet. Interesting. Youāre saying if Gemini 3 or Claude 4.5, whatever, solves a problem, it is not the case that its own understanding of math has progressed. Or even if it works on a problem without solving it, itās not that its own understanding of math has progressed. Yeah. You run a new session and itās forgotten what it just did. It has no new skills to build on related problems. Maybe what you just did is 0.001% of the training data for the next generation. So maybe eventually some of it gets absorbed. 00:53:00 ā If AI solves a problem, can humans get understanding out of it? One big question I have is how plausible is it that if we just keep training AIsāthey get better and better at solving problems in āthat they will continue to solve more and more impressive problems, and then we will be surprised at how little insight we got from some Lean solution to proving the Or do you think it is a necessary condition of solving the Riemann hypothesis, even by an AI that is doing it entirely in Lean, that the constructions and definitions created in the Lean program have to advance our understanding of mathematics? Or could it just be assembly code gobbledygook? We donāt know. Some problems have been basically solved by pure brute force. The is a famous example. We have still not found a conceptually elegant proof of this theorem, and maybe we never will. Some problems may only be solvable by splitting into an enormous number of cases and doing brute force, uninsightful computer analysis on each case. Part of the reason we prize problems like the Riemann hypothesis is that weāre pretty sure a new type of mathematics has to be created, or a new connection between two previously unconnected areas of mathematics has to be discovered to make this work. We donāt even know what the shape of the solution is, but it doesnāt feel like a problem that will be solved just by exhaustively checking cases. Or it could be false actually. Okay, there is an unlikely scenario that the hypothesis is false, and you can just compute a zero off the line, and a massive computer calculation verifies it. That would be very disappointing. I do feel that fully autonomous, one-shot approaches are not the right approach for these problems.
Youāll get a lot more mileage out of the interplay of humans collaborating with these tools. I can see one of these problems being solved by smart humans assisted by extremely powerful AI tools. But the exact dynamic may be very different from what we envision right now. It could be a collaboration of a type that just doesnāt exist yet. There may be a way to generate a million variants of the and do AI-assisted data analysis to discover some pattern connecting them that we didnāt know about before. This lets you transform the problem into a different area of mathematics. There could be all kinds of scenarios. Suppose the AI figures it out, and latent in the Lean is some brand-new construction which, if we realized its significance, we would be able to apply in all these different situations. How would we even recognize it? Again, a very naive question, but if you come up with the equivalent of Descartesā idea that you can have a coordinate system unifying algebra and geometry , in Lean code it would just look like RāR, and it wouldnāt look that significant. Iām sure there are other constructions which have this kind of property. The beauty of formalizing a proof in something like Lean is that you can take any piece of it and study it atomically. When I read a paper which solves some difficult problem, thereās often a big sequence of and theorems. Ideally, the author will talk their way through whatās important and whatās not. But sometimes they donāt reveal what steps were the important ones and which ones were just boilerplate, standard steps. You can study each lemma in isolation. Some of them I can see look fairly standard and resemble something Iām familiar with. Iām pretty sure thereās nothing interesting going on there. But this other lemma, thatās something I havenāt seen before, and I can see why having this result would really help prove the main result. You can assess whether a step is really key to your argument or not, and Lean really facilitates that. The individual steps are identified really precisely. I think in the future, there will be entire professions of mathematicians who might take a giant Lean-generated proof and do some ablation on it, trying to remove parts of it and find more elegant ways. They might get other AIs to do some reinforcement learning to make the proof more elegant, and maybe other AIs will grade whether this proof looks better or not. One thing that will change quite a bit in the near future is how we write papers. Until recently, writing papers was the most time-consuming and expensive part of the job. So you did it very rarely. You only wrote up your results once all the other parts of your argument were checked out, because rewriting and refactoring was just a total pain. Thatās become a lot easier now with modern AI tools. You donāt have to have just one version of your paper. Once you have one, people can generate hundreds more. One giant messy Lean proof may not be very meaningful or understandable on its own, but other people can refactor it and do all kinds of things with it. Weāve seen this with the . An AI will generate a proof, and here are 3,000 lines of code that verify the proof. Then people got other AIs to summarize the proof, and people write their own proofs. Thereās actually post-processing. Once you have one proof, we have a lot of tools now to deconstruct and interpret it. Itās a very nascent area of mathematics, but Iām not as worried about it. Some people are concerned about what happens if the Riemann hypothesis is proven with a completely incomprehensible proof. I think once you have the artifact of a proof, we can do a lot of analysis on it. 00:59:20 ā We need a semi-formal language for the way that scientists actually talk to each other that it would be helpful to have a formal or semi-formal language for mathematical strategies as opposed to just mathematical proofs, which is what Lean specializes in.
AI Collaboration and Mathematical Narratives
- The future of mathematics lies in the interplay between human intuition and AI tools, potentially creating entirely new forms of collaboration that do not yet exist.
- Formalization tools like Lean allow mathematicians to deconstruct complex proofs into atomic parts, making it easier to identify key logical steps versus standard boilerplate.
- AI-generated proofs that appear incomprehensible can be post-processed, refactored, and summarized by other AI agents to make them human-readable.
- There is a growing need for a semi-formal language that captures mathematical strategies and narratives, moving beyond the rigid deductive logic of current proof assistants.
- Historical examples like Gauss's Prime Number Theorem illustrate how data-driven conjectures can revolutionize fields even before a formal proof is possible.
Suppose the AI figures it out, and latent in the Lean is some brand-new construction which, if we realized its significance, we would be able to apply in all these different situations.
Youāll get a lot more mileage out of the interplay of humans collaborating with these tools. I can see one of these problems being solved by smart humans assisted by extremely powerful AI tools. But the exact dynamic may be very different from what we envision right now. It could be a collaboration of a type that just doesnāt exist yet. There may be a way to generate a million variants of the and do AI-assisted data analysis to discover some pattern connecting them that we didnāt know about before. This lets you transform the problem into a different area of mathematics. There could be all kinds of scenarios. Suppose the AI figures it out, and latent in the Lean is some brand-new construction which, if we realized its significance, we would be able to apply in all these different situations. How would we even recognize it? Again, a very naive question, but if you come up with the equivalent of Descartesā idea that you can have a coordinate system unifying algebra and geometry , in Lean code it would just look like RāR, and it wouldnāt look that significant. Iām sure there are other constructions which have this kind of property. The beauty of formalizing a proof in something like Lean is that you can take any piece of it and study it atomically. When I read a paper which solves some difficult problem, thereās often a big sequence of and theorems. Ideally, the author will talk their way through whatās important and whatās not. But sometimes they donāt reveal what steps were the important ones and which ones were just boilerplate, standard steps. You can study each lemma in isolation. Some of them I can see look fairly standard and resemble something Iām familiar with. Iām pretty sure thereās nothing interesting going on there. But this other lemma, thatās something I havenāt seen before, and I can see why having this result would really help prove the main result. You can assess whether a step is really key to your argument or not, and Lean really facilitates that. The individual steps are identified really precisely. I think in the future, there will be entire professions of mathematicians who might take a giant Lean-generated proof and do some ablation on it, trying to remove parts of it and find more elegant ways. They might get other AIs to do some reinforcement learning to make the proof more elegant, and maybe other AIs will grade whether this proof looks better or not. One thing that will change quite a bit in the near future is how we write papers. Until recently, writing papers was the most time-consuming and expensive part of the job. So you did it very rarely. You only wrote up your results once all the other parts of your argument were checked out, because rewriting and refactoring was just a total pain. Thatās become a lot easier now with modern AI tools. You donāt have to have just one version of your paper. Once you have one, people can generate hundreds more. One giant messy Lean proof may not be very meaningful or understandable on its own, but other people can refactor it and do all kinds of things with it. Weāve seen this with the . An AI will generate a proof, and here are 3,000 lines of code that verify the proof. Then people got other AIs to summarize the proof, and people write their own proofs. Thereās actually post-processing. Once you have one proof, we have a lot of tools now to deconstruct and interpret it. Itās a very nascent area of mathematics, but Iām not as worried about it. Some people are concerned about what happens if the Riemann hypothesis is proven with a completely incomprehensible proof. I think once you have the artifact of a proof, we can do a lot of analysis on it. 00:59:20 ā We need a semi-formal language for the way that scientists actually talk to each other that it would be helpful to have a formal or semi-formal language for mathematical strategies as opposed to just mathematical proofs, which is what Lean specializes in.
I would love to learn more about what that would involve or look like. We donāt really know. Weāve been very lucky in mathematics that we have worked out the laws of and mathematics, but this is a fairly recent accomplishment. It was started by two millennia ago, but only in the early 20th century did we finally list out the axioms of mathematics, the standard axioms of what we call , the axioms of first-order logic, and what a proof is. This weāve managed to automate and have a formal language for. There could be some way to assess plausibility. You have a conjecture that something is true, you test a few examples, and it works out. How does this increase your confidence that the conjecture is true? We have a few sort of mathematical ways to model this, like , for example. But you often have to set certain base assumptions, and thereās a lot of subjectivity still in these tasks. This is more of a wish than a plan to develop these languages, but just seeing how successful having a formal framework in place, like Lean, has made deductive proofs so much easier to automate and train AI on⦠The bottleneck for using AI to create strategies and make conjectures is we have to rely on human experts and the test of time to validate whether something is plausible or not. If there was some semi-formal framework where this could be done semi-automatically in a way that isnāt easily hackable... Itās really important with these formal proof assistants that there are no backdoors or exploits you can use to somehow get your certified proof without actually proving it, because reinforcement learning is just so good at finding these backdoors. If thereās some framework that mimics how scientists talk to each other in a semi-formal way, using data and argument, but also constructing narratives... Thereās some subjective aspect of science that we donāt know how to capture in a way that we can insert AI into it in any useful way. This is a future problem. There are research efforts to try to create automated conjectures, and maybe there are ways to benchmark these and simulate this, but itās all very new science. Can you help me get some intuition? I have two sub-questions. One, it would be very helpful to have a specific example of what something like this would look like, the way scientists communicate that we canāt formalize yet. Two, it seems almost definitionally paradoxical to say youāre building up some narrative or natural language explanation and then also having something which you could have formalized. Iām sure thereās some intuition behind where that overlap is, and Iād love to understand that better. and created one of the first mathematical datasets. He just computed the first 100,000 prime numbers or so, hoping to find patterns. He did find a pattern, but maybe not the pattern he was expecting. He found a statistical pattern in the primes that if you count how many primes there are up to 100, 1,000, one million, and so forth, they get sparser and sparser, but the drop-off in the density was inversely proportional to the natural logarithm of the range of numbers. So he conjectured what we now call the : the number of primes up to X is X divided by the natural log of X. He had no way to prove this. It was data-driven. This was a conjecture. It was revolutionary for its time because it was maybe the first really important conjecture of math that was statistical in nature. Normally youāre talking about a pattern, like maybe the spacing between the primes has a certain regularity. But this didnāt tell you exactly how many primes there were in any given range. It just gave you an approximation that got better and better as you went further and further out. It started the field of what we call . It was the first in many conjectures like this, many of which got proved, which started consolidating the idea that the prime numbers didnāt really have a pattern, that they behaved like random sets of numbers with a certain density.
Formalizing Heuristics and Mathematical Intuition
- The author explores the potential for a semi-formal framework that could automate the assessment of mathematical plausibility and scientific narratives.
- Current AI development in mathematics is bottlenecked by the need for human experts to validate conjectures and the risk of reinforcement learning finding 'backdoors' in proofs.
- The Prime Number Theorem serves as a primary example of how statistical models and data-driven conjectures can revolutionize a field even before formal proofs exist.
- The mathematical community relies on a 'random model' of primes that underpins modern cryptography and the belief in the Riemann hypothesis, despite being largely heuristic.
- To better understand scientific progress, the author suggests simulating 'mini-universes' where small AIs evolve their own strategies for solving basic arithmetic problems.
It became more and more productive to think of the primes as if they were just generated by some god rolling dice all the time and creating this random set.
I would love to learn more about what that would involve or look like. We donāt really know. Weāve been very lucky in mathematics that we have worked out the laws of and mathematics, but this is a fairly recent accomplishment. It was started by two millennia ago, but only in the early 20th century did we finally list out the axioms of mathematics, the standard axioms of what we call , the axioms of first-order logic, and what a proof is. This weāve managed to automate and have a formal language for. There could be some way to assess plausibility. You have a conjecture that something is true, you test a few examples, and it works out. How does this increase your confidence that the conjecture is true? We have a few sort of mathematical ways to model this, like , for example. But you often have to set certain base assumptions, and thereās a lot of subjectivity still in these tasks. This is more of a wish than a plan to develop these languages, but just seeing how successful having a formal framework in place, like Lean, has made deductive proofs so much easier to automate and train AI on⦠The bottleneck for using AI to create strategies and make conjectures is we have to rely on human experts and the test of time to validate whether something is plausible or not. If there was some semi-formal framework where this could be done semi-automatically in a way that isnāt easily hackable... Itās really important with these formal proof assistants that there are no backdoors or exploits you can use to somehow get your certified proof without actually proving it, because reinforcement learning is just so good at finding these backdoors. If thereās some framework that mimics how scientists talk to each other in a semi-formal way, using data and argument, but also constructing narratives... Thereās some subjective aspect of science that we donāt know how to capture in a way that we can insert AI into it in any useful way. This is a future problem. There are research efforts to try to create automated conjectures, and maybe there are ways to benchmark these and simulate this, but itās all very new science. Can you help me get some intuition? I have two sub-questions. One, it would be very helpful to have a specific example of what something like this would look like, the way scientists communicate that we canāt formalize yet. Two, it seems almost definitionally paradoxical to say youāre building up some narrative or natural language explanation and then also having something which you could have formalized. Iām sure thereās some intuition behind where that overlap is, and Iād love to understand that better. and created one of the first mathematical datasets. He just computed the first 100,000 prime numbers or so, hoping to find patterns. He did find a pattern, but maybe not the pattern he was expecting. He found a statistical pattern in the primes that if you count how many primes there are up to 100, 1,000, one million, and so forth, they get sparser and sparser, but the drop-off in the density was inversely proportional to the natural logarithm of the range of numbers. So he conjectured what we now call the : the number of primes up to X is X divided by the natural log of X. He had no way to prove this. It was data-driven. This was a conjecture. It was revolutionary for its time because it was maybe the first really important conjecture of math that was statistical in nature. Normally youāre talking about a pattern, like maybe the spacing between the primes has a certain regularity. But this didnāt tell you exactly how many primes there were in any given range. It just gave you an approximation that got better and better as you went further and further out. It started the field of what we call . It was the first in many conjectures like this, many of which got proved, which started consolidating the idea that the prime numbers didnāt really have a pattern, that they behaved like random sets of numbers with a certain density.
They had some patterns, like theyāre almost all odd. Theyāre also not actually random, theyāre whatās called involved in creating the prime numbers. But over time, it became more and more productive to think of the primes as if they were just generated by some god rolling dice all the time and creating this random set. This allowed us to make all these other predictions. Thereās a still-open conjecture in number theory called the , that there should be infinitely many pairs of primes that are twins just two apart, like 11 and 13. We canāt prove that, and there are good reasons why we canāt prove it. But because of this statistical random model of the primes, we are absolutely convinced itās true. We know that if the primes were generated by flipping coins, we would justāby random chance like infinite monkeys at a typewriterāsee twin primes appear over and over again. We have over time developed this very accurate conceptual model of what the primes should behave like based on statistics and probability. Itās mostly heuristic and non-rigorous, but extremely accurate. The few times when we actually can prove things about the primes, it has matched up with the predictions of what we call the . We have this conjectural concept framework for understanding the primes that everyone believes in. Itās the same reason why we believe the Riemann hypothesis is true, and why we believe that cryptography based on the primes is mathematically secure. Itās all part of this belief. In fact, one reason why we care about the Riemann hypothesis is that if the Riemann hypothesis failed, if we knew it was false, it would be a serious blow to this model. It would mean thereās a secret pattern to the primes that we were not aware of. I think we would very rapidly abandon any cryptography based on the primes, because if there was one pattern that we didnāt know about, there are probably more, and these patterns can lead to exploits in crypto. It would be a big shock. So we really want to make sure that doesnāt happen. Weāve been convinced of things like the Riemann hypothesis over time. Some of it is experimental evidence, and some is that the few times weāve been able to make theoretical results, theyāve always aligned. It is possible that the consensus is wrong and weāve all just missed something very basic. There have been paradigm shifts in the past in scientific history. But we donāt really have a way of measuring this, partly because we donāt have enough data on how math or science develops. We have one timeline of history, and we have maybe 100 stories of turning points in history. If we had access to a million alien civilizations, each with a different development of history and science in different orders, then maybe weād actually have a decent shot at understanding how we measure what progress is and what is a good strategy. We could maybe start formalizing it and actually having a framework. Maybe what we need to do is start creating lots of mini-universes or simulations of AI solving very basic problems in arithmetic or whatever, but coming up with their own strategies for doing these things and having these little laboratories to test. There are people who investigate whatās the smallest neural network that can do 10-digit multiplication and things like that. I think we could learn a lot just from evolving small AIs on simple problems. 01:09:48 ā How Terry uses his time You have to learn about new fields not only very rapidly, but deeply enough to contribute to the frontier. So in some sense, youāre also one of the worldās greatest autodidacts. What is your process of learning about a new subfield in math? What does that look like? We talked about depth and breadth before. Itās not a purely human-AI distinction. Humans also, I think it was . The hedgehog knows one thing very well, and a fox knows a little bit about everything. I definitely think of myself as a fox.
Primes, Serendipity, and Mathematical Learning
- The modern understanding of prime numbers relies on a statistical model that treats them as if they were generated randomly, a heuristic that underpins both the Riemann hypothesis and modern cryptography.
- If a hidden pattern were discovered in the primes that contradicted this random model, it would likely render current cryptographic systems insecure and force a paradigm shift in number theory.
- Terry Tao describes himself as a 'fox' who learns new fields through an obsessive, completionist drive to understand the 'magic' or 'tricks' used by other mathematicians.
- Tao emphasizes the importance of serendipity and unplanned interactions, noting that over-optimization and remote work can destroy the casual, productive encounters found in physical hallways.
- To better understand mathematical progress, Tao suggests simulating 'mini-universes' where small AIs evolve their own unique strategies for solving arithmetic problems.
Itās mostly heuristic and non-rigorous, but extremely accurate.
They had some patterns, like theyāre almost all odd. Theyāre also not actually random, theyāre whatās called involved in creating the prime numbers. But over time, it became more and more productive to think of the primes as if they were just generated by some god rolling dice all the time and creating this random set. This allowed us to make all these other predictions. Thereās a still-open conjecture in number theory called the , that there should be infinitely many pairs of primes that are twins just two apart, like 11 and 13. We canāt prove that, and there are good reasons why we canāt prove it. But because of this statistical random model of the primes, we are absolutely convinced itās true. We know that if the primes were generated by flipping coins, we would justāby random chance like infinite monkeys at a typewriterāsee twin primes appear over and over again. We have over time developed this very accurate conceptual model of what the primes should behave like based on statistics and probability. Itās mostly heuristic and non-rigorous, but extremely accurate. The few times when we actually can prove things about the primes, it has matched up with the predictions of what we call the . We have this conjectural concept framework for understanding the primes that everyone believes in. Itās the same reason why we believe the Riemann hypothesis is true, and why we believe that cryptography based on the primes is mathematically secure. Itās all part of this belief. In fact, one reason why we care about the Riemann hypothesis is that if the Riemann hypothesis failed, if we knew it was false, it would be a serious blow to this model. It would mean thereās a secret pattern to the primes that we were not aware of. I think we would very rapidly abandon any cryptography based on the primes, because if there was one pattern that we didnāt know about, there are probably more, and these patterns can lead to exploits in crypto. It would be a big shock. So we really want to make sure that doesnāt happen. Weāve been convinced of things like the Riemann hypothesis over time. Some of it is experimental evidence, and some is that the few times weāve been able to make theoretical results, theyāve always aligned. It is possible that the consensus is wrong and weāve all just missed something very basic. There have been paradigm shifts in the past in scientific history. But we donāt really have a way of measuring this, partly because we donāt have enough data on how math or science develops. We have one timeline of history, and we have maybe 100 stories of turning points in history. If we had access to a million alien civilizations, each with a different development of history and science in different orders, then maybe weād actually have a decent shot at understanding how we measure what progress is and what is a good strategy. We could maybe start formalizing it and actually having a framework. Maybe what we need to do is start creating lots of mini-universes or simulations of AI solving very basic problems in arithmetic or whatever, but coming up with their own strategies for doing these things and having these little laboratories to test. There are people who investigate whatās the smallest neural network that can do 10-digit multiplication and things like that. I think we could learn a lot just from evolving small AIs on simple problems. 01:09:48 ā How Terry uses his time You have to learn about new fields not only very rapidly, but deeply enough to contribute to the frontier. So in some sense, youāre also one of the worldās greatest autodidacts. What is your process of learning about a new subfield in math? What does that look like? We talked about depth and breadth before. Itās not a purely human-AI distinction. Humans also, I think it was . The hedgehog knows one thing very well, and a fox knows a little bit about everything. I definitely think of myself as a fox.
I work with hedgehogs a lot, and sometimes I can be a hedgehog if need be. Iāve always had a little bit of an obsessive streak. If thereās something I read about which I feel like I have the capability to understand, but I donāt understand why it works and thereās some magic in it⦠Someone was able to use a type of mathematics Iām not familiar with and get a result I would like to prove. I canāt do it myself, but they could do it by their method, and I want to find out what their trick was. It bugs me that someone else can do something I think I can do, but I canāt. Iāve always had that obsessive, completionist streak. Iāve had to wean myself off computer games because if I start a game, I want to play it to completion, through all the levels. Thatās one way I learn new fields. I collaborate with a lot of people who have taught me other types of mathematics. I just make friends with another mathematician working on another area of mathematics. I find their problems interesting, but they have to teach me some of the basic tricks, whatās known, and whatās not known. I learn a lot from that. I found that writing about what Iāve learned helps. I have a where I sometimes record things Iāve learned. In the past when I was younger, I would learn something, do this cool trick, and say, āOkay, Iām going to remember this.ā Then six months later, Iād forgotten it. I remember remembering it, but I canāt reconstruct my arguments. The first few times, it was so frustrating to have understood something and then lost it. I resolved I should always write down anything cool that Iāve learned. Thatās part of how this blog came about. How long does it take you to write a blog post? Itās something I often do when I donāt want to do other work. Thereās some referee report or something that feels slightly unpleasant for me to do at the time. Writing a blog feels creative and fun. Itās something I do for myself. Depending on the topic, it could be a quick half an hour or several hours. Because itās something I do voluntarily, time flies when I write these things down, as opposed to doing something I have to do for administrative reasons that is just drudgery. Those are tasks, by the way, that AI is really helping with nowadays. If civilization could from first principles decide how to use Terry Taoās time, as a limited resource, what is the biggest difference? What if the got to decide how to use Terry Taoās time versus what it does now? This podcast wouldnāt be happening. As much as I complain about certain tasks that I donāt want to do, but have to do⦠As you get more senior in academia, you get more and more responsibilities, more committees, and whatever. I have also found that a lot of events I reluctantly went to because I was obliged to for one reason or another⦠Because itās outside my comfort zone, it often results in interactions with people I wouldnāt normally talk to, like you for instance. I would learn interesting things and have interesting experiences. I would have opportunities to then network with other people that I never would have before. So I do believe a lot in serendipity. I do optimize portions of my day where I schedule very carefully. But I am willing to leave some portions just to do something that is not my usual thing. Maybe itāll be a waste of my time, but maybe I will learn something. More often than not, I get a positive experience that I wouldnāt have planned for. So I believe a lot in serendipity. Maybe thereās a danger in modern societies, not just with AI, that weāve become really good at optimizing everything. Weāre not optimizing our own optimization. With COVID, for example, we switched a lot to remote meetings, so everything was scheduled. We kept busy in academia. We met almost the same number of people we met in person, but everything had to be planned in advance. What we lost out on was the casual knocking on a hallway door, just meeting someone while getting a coffee.
Serendipity and the AI Hybrid
- Tao describes his 'obsessive, completionist streak' as a primary driver for learning new mathematical fields and mastering complex techniques used by others.
- Writing for his blog serves as a creative outlet and a vital tool for retention, preventing the loss of complex arguments he might otherwise forget over time.
- The transition to highly optimized, digital workflows has inadvertently eliminated the 'serendipity' of physical research, such as browsing library shelves and finding unexpected insights.
- While AI will soon automate many tasks currently performed by math students, Tao believes human-AI hybrids will remain the dominant force in mathematical discovery for the foreseeable future.
- A certain level of distraction and 'high temperature' randomness is essential for long-term inspiration, as total isolation can eventually lead to stagnation.
What we lost out on was the casual knocking on a hallway door, just meeting someone while getting a coffee.
I work with hedgehogs a lot, and sometimes I can be a hedgehog if need be. Iāve always had a little bit of an obsessive streak. If thereās something I read about which I feel like I have the capability to understand, but I donāt understand why it works and thereās some magic in it⦠Someone was able to use a type of mathematics Iām not familiar with and get a result I would like to prove. I canāt do it myself, but they could do it by their method, and I want to find out what their trick was. It bugs me that someone else can do something I think I can do, but I canāt. Iāve always had that obsessive, completionist streak. Iāve had to wean myself off computer games because if I start a game, I want to play it to completion, through all the levels. Thatās one way I learn new fields. I collaborate with a lot of people who have taught me other types of mathematics. I just make friends with another mathematician working on another area of mathematics. I find their problems interesting, but they have to teach me some of the basic tricks, whatās known, and whatās not known. I learn a lot from that. I found that writing about what Iāve learned helps. I have a where I sometimes record things Iāve learned. In the past when I was younger, I would learn something, do this cool trick, and say, āOkay, Iām going to remember this.ā Then six months later, Iād forgotten it. I remember remembering it, but I canāt reconstruct my arguments. The first few times, it was so frustrating to have understood something and then lost it. I resolved I should always write down anything cool that Iāve learned. Thatās part of how this blog came about. How long does it take you to write a blog post? Itās something I often do when I donāt want to do other work. Thereās some referee report or something that feels slightly unpleasant for me to do at the time. Writing a blog feels creative and fun. Itās something I do for myself. Depending on the topic, it could be a quick half an hour or several hours. Because itās something I do voluntarily, time flies when I write these things down, as opposed to doing something I have to do for administrative reasons that is just drudgery. Those are tasks, by the way, that AI is really helping with nowadays. If civilization could from first principles decide how to use Terry Taoās time, as a limited resource, what is the biggest difference? What if the got to decide how to use Terry Taoās time versus what it does now? This podcast wouldnāt be happening. As much as I complain about certain tasks that I donāt want to do, but have to do⦠As you get more senior in academia, you get more and more responsibilities, more committees, and whatever. I have also found that a lot of events I reluctantly went to because I was obliged to for one reason or another⦠Because itās outside my comfort zone, it often results in interactions with people I wouldnāt normally talk to, like you for instance. I would learn interesting things and have interesting experiences. I would have opportunities to then network with other people that I never would have before. So I do believe a lot in serendipity. I do optimize portions of my day where I schedule very carefully. But I am willing to leave some portions just to do something that is not my usual thing. Maybe itāll be a waste of my time, but maybe I will learn something. More often than not, I get a positive experience that I wouldnāt have planned for. So I believe a lot in serendipity. Maybe thereās a danger in modern societies, not just with AI, that weāve become really good at optimizing everything. Weāre not optimizing our own optimization. With COVID, for example, we switched a lot to remote meetings, so everything was scheduled. We kept busy in academia. We met almost the same number of people we met in person, but everything had to be planned in advance. What we lost out on was the casual knocking on a hallway door, just meeting someone while getting a coffee.
Those serendipitous interactions may not seem optimal, but they are actually really important. When I was a grad student, I would go to the library to look for a journal article. You had to physically check out the journal and read the article. You could browse through and sometimes the next article was also interesting. Sometimes it wasnāt, but you could accidentally find interesting things. That has basically been lost now. If you want to access an article, you just type it into a search engine or an AI, and you get exactly what you want instantly. But you donāt get the accidental things you might have found if youād done it more inefficiently. , which is a great place with no distractions. Youāre there just to do research. The first few weeks youāre there, itās great. Youāre getting all these papers written up that youāve been wanting to do for a long time. You think about problems for blocks of hours at a time. But I find if I stay there for more than several months, I run out of inspiration. I get bored. I surf the internet a lot more. You actually do need a certain level of distraction in your life. It adds enough randomness and high temperature. I donāt know the optimal way to schedule my life. It just seems to work. 01:17:05 ā Human-AI hybrids will dominate math for a lot longer Iām very curious when you expect AIs that can actually do frontier math at least as well as the best human mathematicians. In some ways, theyāre already doing frontier math that is super intelligent that humans canāt do, but itās a different frontier from what weāre used to. You could argue that calculators were doing frontier math that humans could not accomplish, but it was number crunching. But replacing Terry Tao completely. I mean, what do you want me for? Youāll just go on all the podcasts after. It might not be the right question to ask. I think within a decade, a lot of things that math students currently doāwhat we spend the bulk of our time doing and a lot of stuff we put in our papers todayācan be done by AI. But we will find that that actually wasnāt the most important part of what we do. A hundred years ago, a lot of mathematicians were just solving . Physicists needed some exact solution to some system, and they hired a mathematician to laboriously go through the calculus and work out the solution to this fluid equation, whatever. A lot of what a 19th-century mathematician would do, you could make a call to , Wolfram Alpha, a computer algebra package, or now more recently to an AI, and it would just solve the problem in a few minutes. But we moved on. We worked on different types of problems after that. Once computers came alongācomputers used to be human. People used to laboriously create log tables and work out primes as Gauss did, and that has all been outsourced to computers. But we moved on. In genetics, to sequence the genome of a single organism, that was an entire PhD of a geneticist, carefully separating all the chromosomes and whatever. Now you can just spend $1,000 and send it to a sequencer and get it done. But genetics is not dead as a subject. You move to a different scale. Maybe you study whole ecosystems rather than individuals. I take your point but when is most mathematical progress, or almost all mathematical progress, happening by AI? If you find out this year a Millennium Prize Problem has been solved, you would put 95% odds that an AI did it autonomously. Surely there will be such a year. I guess I do believe that hybrid human plus AIs will dominate mathematics for a lot longer. It will depend. It will require some additional breakthroughs beyond what we already have, so itās going to be stochastic. I think AIs currently are very good at certain things, but really terrible at others.
Mathematics in the AI Era
- The loss of serendipity and 'inefficient' browsing in the digital age may inadvertently stifle the accidental discoveries that fuel scientific inspiration.
- Historical shifts in mathematics, such as the automation of log tables and calculus, suggest that AI will automate current tasks while humans move to higher levels of abstraction.
- Hybrid human-AI collaboration is expected to dominate the mathematical frontier for the foreseeable future rather than total AI replacement.
- Aspiring mathematicians must adopt an adaptable mindset, as AI tools like Lean may allow non-traditional contributors to reach the research frontier much earlier.
- While AI accelerates certain types of progress, the current unpredictability of the field makes it both a scary and exciting time for intellectual pursuits.
You actually do need a certain level of distraction in your life. It adds enough randomness and high temperature.
Those serendipitous interactions may not seem optimal, but they are actually really important. When I was a grad student, I would go to the library to look for a journal article. You had to physically check out the journal and read the article. You could browse through and sometimes the next article was also interesting. Sometimes it wasnāt, but you could accidentally find interesting things. That has basically been lost now. If you want to access an article, you just type it into a search engine or an AI, and you get exactly what you want instantly. But you donāt get the accidental things you might have found if youād done it more inefficiently. , which is a great place with no distractions. Youāre there just to do research. The first few weeks youāre there, itās great. Youāre getting all these papers written up that youāve been wanting to do for a long time. You think about problems for blocks of hours at a time. But I find if I stay there for more than several months, I run out of inspiration. I get bored. I surf the internet a lot more. You actually do need a certain level of distraction in your life. It adds enough randomness and high temperature. I donāt know the optimal way to schedule my life. It just seems to work. 01:17:05 ā Human-AI hybrids will dominate math for a lot longer Iām very curious when you expect AIs that can actually do frontier math at least as well as the best human mathematicians. In some ways, theyāre already doing frontier math that is super intelligent that humans canāt do, but itās a different frontier from what weāre used to. You could argue that calculators were doing frontier math that humans could not accomplish, but it was number crunching. But replacing Terry Tao completely. I mean, what do you want me for? Youāll just go on all the podcasts after. It might not be the right question to ask. I think within a decade, a lot of things that math students currently doāwhat we spend the bulk of our time doing and a lot of stuff we put in our papers todayācan be done by AI. But we will find that that actually wasnāt the most important part of what we do. A hundred years ago, a lot of mathematicians were just solving . Physicists needed some exact solution to some system, and they hired a mathematician to laboriously go through the calculus and work out the solution to this fluid equation, whatever. A lot of what a 19th-century mathematician would do, you could make a call to , Wolfram Alpha, a computer algebra package, or now more recently to an AI, and it would just solve the problem in a few minutes. But we moved on. We worked on different types of problems after that. Once computers came alongācomputers used to be human. People used to laboriously create log tables and work out primes as Gauss did, and that has all been outsourced to computers. But we moved on. In genetics, to sequence the genome of a single organism, that was an entire PhD of a geneticist, carefully separating all the chromosomes and whatever. Now you can just spend $1,000 and send it to a sequencer and get it done. But genetics is not dead as a subject. You move to a different scale. Maybe you study whole ecosystems rather than individuals. I take your point but when is most mathematical progress, or almost all mathematical progress, happening by AI? If you find out this year a Millennium Prize Problem has been solved, you would put 95% odds that an AI did it autonomously. Surely there will be such a year. I guess I do believe that hybrid human plus AIs will dominate mathematics for a lot longer. It will depend. It will require some additional breakthroughs beyond what we already have, so itās going to be stochastic. I think AIs currently are very good at certain things, but really terrible at others.
While you can add more and more frameworks on top to reduce the error rates and make them work with each other a bit more, it feels like we donāt have all the ingredients to really have a truly satisfactory replacement for all intellectual tasks. It is complementary currently. Itās not a replacement. Because current level AIs will accelerate science in so many ways, hopefully new discoveries and new breakthroughs will happen more quickly. Itās also possible that by destroying serendipity we actually inhibit certain types of progress. Anything is possible at this point. I think the world is very, very unpredictable at this point in time. What is your advice to somebody who would consider a career in math or is early in a career in math, especially in light of AI progress? How should they be thinking about their career differently, if at all, as a result of AI progress? We live in a time of change. As I said, we live in a particularly unpredictable era. Things that weāve taken for granted for centuries may not hold anymore. The way we do everything, and not just mathematics, will change. In many ways, I would prefer the much more boring, quiet era where things are much the same as they were 10 years ago, 20 years ago. But I think one just has to embrace that thereās going to be a lot of change. The things that you study, some of them may become obsolete or revolutionized, but some things will be retained. You always have to keep an eye on opportunities for things that you wouldnāt be able to do before. In math, you previously had to go through years and years of education and be a math PhD before you could contribute to the frontier of math research. But now itās quite possible at the high school level, or whatever, that you could get involved in a math project and actually make a real contribution because of all these AI tools, Lean, and everything else. There will be a lot of non-traditional opportunities to learn, so you need a very adaptable mindset. There will be room for pursuing things just for curiosity and for playing around. You still need to get your credentials. For a while it will still be important to go through traditional education and learn math and science the old-fashioned way. But you should also be open to very different ways of doing science, some of which donāt exist yet. Itās a scary time, but also very exciting. Thatās a great note to close on. Terence, thanks so much. Jensen Huang ā TPU competition, why we should sell chips to China, & Nvidiaās supply chain moat Michael Nielsen ā How science actually progresses Dylan Patel ā Deep dive on the 3 big bottlenecks to scaling AI compute The most important question nobody's asking about AI Why Leonardo was a saboteur, Gutenberg went broke, and Florence was weird ā Ada Palmer Dario Amodei ā "We are near the end of the exponential" Elon Musk ā "In 36 months, the cheapest place to put AI will be spaceā