Flibbertigibbeting
That’s a real word your AI might show you while it’s chewing on your question. So is “cogitating.” And “noodling.” And “discombobulating.” Sometimes you get a plain old “Thinking…”. If you’re using Claude, you’ll often get something weirder, and if you’re in ChatGPT you’ll see the little “Thinking” badge with a timer counting up. Click to expand that indicator and you can literally watch the model talk to itself for a few seconds before it lands on an answer.
That’s not a UI gimmick, and the playful word choices aren’t either. They’re a hint that something new is happening under the hood, especially compared to the GPT‑3 and GPT‑4 era.
Here’s the simple version of what’s happening.
For most of modern AI’s history, making a model smarter was pretty straightforward: more data, more parameters, bigger computers. Bigger models, better answers. That was basically the playbook from GPT‑2 through GPT‑4.
In late 2024, OpenAI released a model called o1 that did something different. It still went through the same kind of training. But when you asked it a question, it would pause and generate a stream of internal reasoning—“thinking tokens”—before producing an answer. Think of it as the model showing its work before turning in the test. That pause turned out to matter a lot.
Show Your Work
Every “thinking” token the model generates is a step it can use to work through the problem. Without those steps, there’s a pretty hard ceiling on what these models can solve. With them, that ceiling lifts.
Researchers at Google DeepMind showed that letting a model think longer at question time can outperform a model that’s 14 times larger but answers immediately. More thinking can substitute for more raw size, which is a wild result if you’ve spent the last few years assuming “bigger model = better model.”
A quick vocabulary moment, because the technical term will come up. The phase where a trained model actually answers your questions is called inference. Training is when you build the model. Inference is when you use it. So when researchers talk about “test-time compute” or “inference-time scaling,” they’re just saying the model is using more computer power while it answers you, instead of spending that computer power back when it was being built.
Where You Already See This
By April 2026, every major AI lab has built reasoning into their flagship models. ChatGPT has GPT‑5.4 Thinking. Claude has what Anthropic calls Extended Thinking, and on the newest Claude models it’s on by default. Gemini has a Deep Think toggle on its 3.1 Pro tier. Even some open source models like DeepSeek‑R1 have made reasoning standard. If you’ve ever seen a “Deep Think” or “Extended Thinking” option in your settings, this is what it’s doing behind the scenes.
The origin story: When DeepSeek’s researchers were training R1, they watched something weird happen. The model started writing things like “wait, let me reconsider” in the middle of its reasoning, then re‑deriving its answer from a different angle. Nobody trained that behavior in. It emerged on its own. The researchers called it the “aha moment.” It’s the closest thing AI has to a model spontaneously deciding to double‑check its own work. When you see your AI pause and take a second pass, you’re seeing a version of that behavior.
The Tradeoffs
Reasoning models aren’t free.
They’re slower. Sometimes 30 times slower than the same model in non‑thinking mode. The “Thinking…” indicator isn’t fake; the model is genuinely doing more work.
They cost more. On the API side, you can pay roughly six times more for a thinking response than a quick one.
And they aren’t always better. Recent research found something counterintuitive: more thinking helps a lot on math, coding, scientific reasoning, and multi‑step planning. But on simple knowledge questions, more thinking can actually make models worse. They sometimes talk themselves into confident wrong answers. Ask for the capital of a country, and a thinking model can occasionally overcomplicate it and hallucinate some obscure nuance that doesn’t exist.
When to Use the Thinking Mode
For everyday users, the rough rule is this:
If your question has one right answer that requires working through steps, turn thinking on. Math problems, coding bugs, planning decisions, analyzing a complicated document, working through a logic puzzle, anything where “show your work” would actually help a human get the right answer.
If your question is fast, casual, or pure recall, you can usually skip it. “What’s the capital of Belgium.” “Write me a quick birthday message.” “What’s a good restaurant in Mobile” (P.S. the answer is Noble South). Thinking mode adds latency without adding accuracy.
There’s a gray zone too. If you’re asking for lightweight brainstorming or a quick list of ideas, the non‑thinking modes are usually fine, and you can always re‑run with thinking on if the first answer feels shallow.
So next time you see “Flibbertigibbeting…” pop up on your screen, you’re not just watching a loading bar. You’re watching the model do something that, even two years ago in the GPT‑4 era, most AI systems couldn’t really do at all.
See You Friday
If you want to see humans thinking together in real time, we’re in Mobile this Friday, May 1st, for our meetup at the Innovation Portal Building (358 St. Louis St.) from 3:30 to 5:30 PM. Recent meetups have been standing‑room‑only, so come early if you want a seat. Bring a friend who’s curious about AI, and maybe show them what “Flibbertigibbeting…” looks like on your screen.