Welcome back.
Tokens come up constantly in AI conversations. Almost nobody can explain what one actually is.
A token is the basic unit of text an AI model sees. Not quite a word. Not quite a character. Not quite a byte, though sometimes it happens to line up with all three.
Think of it like Lego bricks. You and I read in full sentences, smooth and continuous. The model breaks that sentence into pieces and rebuilds it internally as a sequence of bricks. Those bricks are tokens.
And different models come with different Lego sets. The word "newsletter" might be one chunky brick in one model and two smaller pieces (news, letter) in another. Same word, different breakdown. That means "how many tokens is this?" isn't a universal question. It's "how many tokens is this for this specific model?"
If you're on a subscription like ChatGPT Plus or Claude Pro, you've probably never seen a token count in your life. That's by design. But behind the scenes, every prompt you send is still getting chopped into tokens, and someone is paying per token. Developers and businesses using AI APIs are billed directly by the token. Their bills can swing wildly because two prompts that look identical to a human can produce very different token counts. Emoji, foreign languages, and weird punctuation chew through more tokens than plain English. If you build with AI, the tokenizer is quietly swiping your credit card.
Tokens also define the model's working memory, called the context window. The bigger the conversation or document, the more tokens the model has to juggle at once. That's why long PDFs and chats with hundreds of messages get sluggish or quietly truncated. It's not about how the text looks to you. It's about how many bricks the model had to lay down.
If you want a clean mental model, picture three layers. You think in sentences and paragraphs. The AI service thinks in tokens and token counts. The model itself thinks in numbers and vectors in some enormous mathematical space.
We only see the top layer. Most of the confusion about cost, latency, and quirky AI behavior comes from forgetting that the other two layers are where the real constraints live.
There's also a half-joking, half-serious conversation happening right now that tokens are the new oil. The new currency. The unit of compute that everything else gets priced against. AI is starting to do more of the world's cognitive work. AI runs on tokens. So tokens are starting to look like the actual thing you're buying when you buy intelligence. Most of us just can't see the meter running yet.
You don't need to become an expert on tokens. You just need to know what they're doing to your wallet and to your AI's brain.