“
[All] modern chatbots are actually trained simply to predict the next word in a sequence of words. They generate text by repeatedly producing one word at a time. For technical reasons, they generate a “token” at a time, tokens being chunks of words that are shorter than words but longer than individual letters. They string these tokens together to generate text.
When a chatbot begins to respond to you, it has no coherent picture of the overall response it’s about to produce. It instead performs an absurdly large number of calculations to determine what the first word in the response should be. After it has output—say, a hundred words—it decides what word would make the most sense given your prompt together with the first hundred words that it has generated so far.
This is, of course, a way of producing text that’s utterly unlike human speech. Even when we understand perfectly well how and why a chatbot works, it can remain mind-boggling that it works at all.
Again, we cannot stress enough how computationally expensive all this is. To generate a single token—part of a word—ChatGPT has to perform roughly a trillion arithmetic operations. If you asked it to generate a poem that ended up having about a thousand tokens (i.e., a few hundred words), it would have required about a quadrillion calculations—a million billion.
”
”
Arvind Narayanan (AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference)