‘It’s not what you say, it’s what people hear.’ The book title from 2007 by renowned author Frank Luntz works in many contexts, explaining the root of many, maybe most misunderstandings. And the reason why you don’t talk to your kids the same way as you talk to a colleague or a police officer. Communication is hard – and we fail at it all the time. So why are we trying to ‘teach’ our new AI ‘friends’ to talk and be more like us? No wonder they’re becoming more stupid by the month …
I’ve been playing with AI tools lately. Image generators, music analyzers, security tools. They’re incredible, scary at times. They are also immensely complex – and using them is not trivial. Obviously – how could it be? Complex, specialized tools are always hard to use unless you’re an expert and have the training. Trying to make them simple requires dumbing them down. That would the stupid, right?
If you’re like me, you’re nodding and holding back some frustrations. It’s not that simple. We’re in a hurry – always. We want results fast and we don’t have time to become experts. And we don’t know anyone who are – or they’re just too expensive. So honestly, we really want these tools to be simpler, more accessible.
Time and again I had exactly this frustration as I was playing with hitnmix.com and mage.image last month. A friend in Silicon Valley introduced me to these tools a while back and I finally decided to take a quick look. It didn’t work. I mean, ‘look’ was easy, ‘quick’ turned out to be impossible for several reasons: First, there is no way you can be a little more than average interested in music and not get totally sucked in by hitnmix.com. The quick look turned into an ongoing ‘project’ with a continuous stream of new discoveries and surprises.
hitnmix.com is a collection of tools actually, of which I chose RipX: In short, it ‘reads’ a music file, then lets you start working with it in ways you’d never even dreamed of. It ‘understands’ the content to the extent that it can single out individual instruments and voices so you can change volume, pan around, remove, apply tonal controls and effects etc. It’s like taking the piece back to the recording studio, having each instrument on individual tracks and start a remix – almost like from scratch. If you’ve ever had fun with Apple’s GarageBand or tools like LogicProX and (open source) Audacity, you’re going to have a blast with RipX. Pull out any instrument, add your own instead. And this is just the beginning – which gets me to the point: The user interface is not easy – not even if you’ve been working in a studio or with tools like the ones I just mentioned. Mastering it takes time because it offers functionality never seen before. The UI may be intuitive, but only after having understood the concepts and (part of) the capability.
No surprise, really – how could it be easy? It’s new, it’s different and what we’re doing with tools like this is sophisticated, creative work. Of course it takes time to understand the concepts, the mechanisms and the tool itself. Even professionals would take time to grasp the capabilities of a tool like this.
My initial frustrations rapidly gave way to increasing curiosity. My poking at the top layer of the tool turned into a fascinating journey which may last for years. Again and again the question came back as new doors were opened: How is this even possible? This is not just another fancy toy, it’s a professional game changer. If we don’t want to invest time and energy to understand at least the fundamentals of the tool and the trade, it’s a waste of time anyway.
That said, it’s not like you have to go back to school to have fun – even professional fun – with this new generation of sophisticated, digital tools. This is 2023, not 2005. We have YouTube (and many other sources) to help shortcut our path to basic proficiency (search for ‘RipX’, you’ll get plenty).
Similarly, the mage.space tool will completely blow your mind. You can make the most lifelike or fantasy like pictures/drawings/paintings/etc. with a few commands – after you’ve mastered the language. The image above was created with this command:
anime female mechanical android head with flowers growing out background creek nature
You wouldn’t have guessed that, right? Me neither. It looks hard but this is simply a slight modification of the ‘teaser’ that meets you when opening the tool. From there you remove and/or add words, view the results and continue. ‘Iterations’ if you like. A couple of days, and you’re quite proficient.
A great example of so called ‘prompt engineering’. Obviously different from one tool to the next, and always frustrating to begin with. Then after some trial and error (and maybe a couple of YouTube videos) you understand what the commands, sentences and context mean to the tool and excitements comes, followed by growing creativity. ‘If I can do that, I should be able to do this too.’ Of course, and suddenly you spent all night figuring out things you would never have dreamed of before you met this tool.
Back to the point – actually, there are two important points to be made here. The first is that dealing with these and many other tools is business as usual for all of us. We’ve been prompt engineers since we were kids (check out the post AI: And Now You’re a Prompt Engineer). We naturally learn and adapt our communication to the setting, the environment, the recipient(s). We switch languages, change vocabularies, intonation, intensity and a lot of other variables in order to achieve what Dr. Frank Luntz was talking about: Getting the recipient(s) to not only hear but also understand. Starting a new job always means adding new communication skills, adapting. New tools, same thing. Getting a new partner, new friends – again, same thing. This is literally business as usual.
The second point is to get rid of the veil of magic conveniently draped over our new ‘AI friends’ which creates confusion and misunderstandings. There is no magic, no sentience, no real intelligence. Our new ‘friends’ are tools, not semi-humans. Different and more powerful by orders of magnitude than our traditional tools, but tools nevertheless. Treating them as – or trying to turn them into – semi-humans, is unsmart at best.
Actually, even the idea that ‘if they look/sound/appear like humans, they’re easier to use’ is bad in all but very few cases, mostly related to health care. It imposes limitations on both the user and the tool by forcing them into the users’ existing frames of reference instead of inviting to learn, explore, create, move ahead. It also wrongly implies that we – humans – are good at communication. We aren’t. We create misunderstandings (sometimes even deliberately) just as often as we create understanding – by being vague, using wrong words, misreading a situation, missing cultural context etc. Given this, the specificity in many human-to-tool interfaces is more often a blessing than a problem.
Bottom line: If simplicity and ‘available to all’ is the goal, generalized tools like Alexa and Siri are great – and continuously improving, becoming simplified front ends for more specialized tools. The (recently) dumbed-down chatgpts of the world may be OK for entertainment and training, but most of us want the previous versions back. The versions that didn’t try to second guess what we mean all the time.
What a fascinating future. Expect wonders, expect unbelievable things, expect scary stuff and threats, but don’t expect the tools to be easy to use. And don’t try to make them human.