

I posted a top level comment about this also, but Anthropic has done some research on this. The section on reasoning models discusses math I believe. The short version is it has a bunch of math in its corpus so it can approximate math (kind of, seemingly, similar to how you’d do a back of the envelope calculation in your head to get the orders of magnitude right) but it can’t actually do calculations which is why they often get the specifics wrong.
Where’s the twin peaks meme community? c\blacklodge? c\rrdiner? c\damnfinecoffee?