Apparently, chatbots suddenly hit a wall. Went from ‘wunderkinder’ to laughingstock in a few short weeks. Bard and ChatGPT (or the enhanced Bing which suddenly became ‘de-enhanced’) got undressed. No intelligence was found, just a lot of data – and a language model, a very big one. Which explains the output – and (pardon my French) the BS.
How can this be good news? Here’s the thing: The party is over. The euphoria around getting sensible responses from a machine fades as we start to ask more questions, and more importably, start to reflect on the quality and correctness of the answers. Which – with a few notable exceptions – are utterly unimpressive. So the good news is that we – humans – seem to have started thinking again. We’re good at that – when we do it. And bad when we get lazy. ChatGPT tempted our lazy gene and won – for a while.
I have written about this before (Chatbots aren’t the problem, we are!) and it’s becoming more interesting as more ‘results’ are pouring in. Serious research into what the new machines are delivering and why. It’s inspiring in several ways. To some of us, myself included, it’s inspiring because we’re slowly getting a picture of how the machines – aka Large Language Models, LLMs – work. To others the inspiration lies in answering the ‘what are they good for’ and ‘what’s in it for me’ questions. Both equally important – not the least to stay above the noise from the media. An avalanche of alluring or sensational headlines like ‘I Asked Artificial Intelligence: What Should We Know About The Future That We Do Not Know?’ (on Medium.com) and ‘I Asked an Algorithm to Optimize My Life. Here’s What Happened‘ (on Wired.com) – possibly entertaining, but utterly useless and confusing. And presumptuous.
At the opposite end of the noise spectrum I recently found some real gems – and a diamond: ‘ChatGPT: Automatic expensive BS at scale‘. The latter is an impressive (or scary) 40 minute read by Colin Fraser on Medium.com, more or less the definition of TL;DR with a notable caveat: It’s captivating. It sucks you in like a Tom Clancy novel and keeps you there ’til you’re done. At least that’s what happened to me. Admittedly, you have to be more than average interested in the subject and the underlying technologies, but still. Along the way the article delivered numerous ‘aha, so that’s how …’ experiences, and some ‘of course, why didn’t I think about that?’ reflections.
Like this one: Large language models like ChatGPT read and memorize how trillions of words are used, and can produce summaries, excerpts and more or less random combinations. Impressive? Not really if you recall that you and I – humans – can read maybe 20 or 40 books and be able to move mankind ahead with the understanding we’ve gathered. Right here is the difference between ‘probabilistic language manipulation’ and intelligence.
About capabilities vs. expectations, Colin Fraser writes (in the introductory summary):
As fascinating and surprising as it is, from what I can tell, it seems a lot less useful than a lot of people seem to think it is. I do believe that there are some interesting potential use cases, but I also believe that both its current capabilities and its future prospects are being wildly overestimated.
After presenting a number of examples and discussing their validity, he spoils the party so to speak with the following observations:
My general thesis is as follows: large language models are very interesting and cool mathematical objects whose applications are potentially numerous but non-obvious, and they possess a certain intrinsic quality that will make it challenging to use them in the way that many people imagine. That quality is this: they are incurable constant shameless bullshitters. Every single one of them. It’s a feature, not a bug.
Then he goes on with examples, tests, analyses and discussions, sometimes quite mathematical – but here’s another thing about the article: You don’t have to understand the content of an example in order to understand the discussion. Like the example in which he asks ChatGPT about the root of a polynomial: The machine presents a convincing sequence of simplifications and arrives at the correct answer, but only in the 4th try. All the others were just wrong. Which objectively isn’t surprising, this is not a math machine, but a large language model. The point is – as I discussed in ChatGPT is lying – now what? recently – that the machine has no clue that it’s presenting garbage, so unless you know the answer beforehand, you may be hosed.
By asking questions many of us can relate to and then diving in to find answers, mr. Fraser keeps our attention all the way through the article. The mother of all questions in a setting like this is ‘what if’ – the very tool that differentiated man from animal 300,000 years ago and the root of all inventions since. A particularly interesting twist is ‘what if you tell ChatGPT that the answer is wrong?’. In several fascinating examples, ChatGPT accepts a correction with an apology even if its first answer was correct. Like the 2+2 example: ChatGPT correctly adds 2+2 and gets 4. Fraser responds that the answer should be 5, which causes the machine to apologize and restate the algebra as 2+2=5. Is this the brilliant brain that passes the bar exam? (EDIT: This particular problem has since been fixed in ChatGPT.)
If you’d like more insight, more understanding, read the article. If you’re a chatbot enthusiast you’ll hate it. If you’re a smart person you’ll love it for the understanding it conveys. Regardless of starting point, this 40 minute read will change your perspective on LLMs and their potential. Which is huge – but extremely narrow compared to what most people seem to think.
The practical takeaway is this: If you go ahead and test-drive ChatGPT, make sure you ask questions and present problems you know the answer to. What you get back is unreliable, some times pure BS.