It’s almost like running out of gas, except everyone’s surprised: Oops, this thing runs on gas? Where can we get more and who pays? Of course chatbots don’t run on gas, but they do run on data and the data-pipes are about to close. How can this happen and can this possibly be a good thing?
In the feeding frenzy around AI, MMLs and chatbots, most of us forgot or conveniently ignored that the machine behind the scene, the ‘brain’ that delivers all these impressive results, must be fed. Data, lots of data. Energy, lots of energy. Expertise, lots of expertise. Technology – etc. etc. None of these come for free. In fact, all of the ingredients are expensive, except data. Because the companies behind the bots had these mountains of data already, right?
Wrong – a misconception with grave consequences that I’ll get back to in a minute (spoiler alert: it’s the most fascinating part). Let’s start off with a more practical angle – is the (free) party really over. Or the other way around: Why was there a free party to begin with, the one that so many of us have participated in for the last 6 months? The ‘that’s the Internet way’ doesn’t work when the party ticket runs into the billions.
Seriously – there is no reason to be surprised. Just look at the numbers. Developing and running digital monsters like ChatGPT, Bard, Ernie and their likes is hugely expensive. OpenAI (parent of GhatGPT and GPT-3/4) lost USD 540 million last year alone, twice as much as the year before. Tens of thousands of servers, petabytes galore, huge datacenters, power bills, the best experts in the world etc. etc. The bill – if we add up just the big ones – is in the billions per month. Without big money from Microsoft, OpenAI would have been out of business last year.
Admittedly, the revenues from chatbot based services are rising but it’s pocket change level and will likely stay that way for quite some time. Still, they – Google, Baidu, Apple, Amazon, Microsoft, etc. – seem more than happy to foot the bill, eager to demonstrate that they’re ahead or in line with the competition. Competing for the (assumed) future – seemingly at any cost. Not unusual in itself, but the scale is unusual. The flow of money will not end any time soon, so cost isn’t the party killer. What is?
Data. Data is the fuel keeping the bots alive and kicking. Big data, huge data pipes. In order to be relevant and useful, they need updated, high quality data – from news to science, from history to statistics, from medicine to literature and so on – continuously. This is the point at which you may rise an eyebrow: What’s the problem? These companies have all the data in the world. Search data, history data, user data, statistics, public data etc. etc. What are we missing?
Actually, they don’t. They do have lots of data, much of it useful, but they need much more. This far they have been feeding off of all kinds of Internet resources – more or less by default. Like Wikipedia, Reddit, Twitter, StackOverflow and thousands of news sites, blog sites, encyclopaedias, government sites etc. For free – like most of us do (side note: As most of us have experienced, ‘free’ on the Internet is very different today compared to 10 or 20 years back. Even paid resources overflow with ads these days – it’s disturbing, even worrisome. Which is why some us donate to Wikipedia every year. Do you?)
Here’s the thing: Many of these data sources are closing the pipes feeding the bots, blocking off huge libraries of valuable and important content. They’re as tired of the pilfering as the news media are of the (free) sharing in social media. A few pieces of content now and then is nothing, millions every day is a different story, and that’s what the bots need – and take. The message from the data owners is increasingly ‘pay up or go away, Bots’.
Rather natural now that we see the picture. It’s also a change with potentially dramatic consequences. Think about it. When gas prices were low, we didn’t think much about how and when we drove, about mileage etc. Prices came up and the picture – our thinking – became very different. Same thing with power. I live in a country where the price per kWh used to be in the 10ths of a cent or less. Now we’re talking 15-25¢, occasionally 60¢ per kWh. You can imagine what that does to attitude?
How can this running-out-of-gas situation be good news? It’s not – for the chatbot owners, but it is for the rest of the world, or at least for those of us somewhat concerned about where we’re heading with this fascinating technology: The explosive development of LLMs, chatbots and new AI services has sent politicians, bureaucrats, even experts running for shelter, looking for ways to rein in the monsters. It’s hard, maybe impossible because it’s a fast-moving target in an international market driven by – as we’ve seen – actors with huge resources and global reach. Controlling the ‘fuel pipes’ may be a far more effective and flexible way (maybe even the only way) to gain at least some control.
Is it possible? Absolutely. Like it or not, money talks far more effectively than policies, idealism and political power plays. Owning or controlling (and developing) the data is as important as owning (and developing) the smart technology. That said, it’s complicated. Many sources, many variables, many players, many motives (think OPEC).
An interesting and unpredictable scenario indeed, in which the most important issue right now is to understand the big picture, the role of data and make sure the technology (LLM) owners don’t control/own the most important data sources.
Which sources may that be? Some of them are obvious, but keep an eye on where the big ones put their attention – and money – over the next months and years, and you’ll get a strong hint. Exciting times indeed.