Archive Social Media? You Must be Joking…

Image © stock.adobe.comImage © stock.adobe.com

Historians and researchers are frantically trying to archive the Internet – including social media. Begging for funding to ‘preserve a rich data set of political discourse and communication trends’. Really? Save 500 million tweets and (say) 200 million Facebook posts per day? Comments, pictures, videos, likes, links, shares, the whole shebang? And that’s just two platforms and one data type. Insane.

Actually, an (in)famous Bill Gates quote fits: ‘That’s the stupidest thing I’ve ever heard’. It not only sounds impossible in a physical sense, but seriously – most of it is garbage. On the other hand, the Twitter-Musk-circus makes the issue relevant: What if Twitter goes dark and stays there? That would be the largest wipeout of social data to date. But is it a big loss – or a loss at all? Depends on who you ask. Some people actually started archiving their own accounts when the circus started (see the Wired reference below). Others, myself included, couldn’t care less.

Regardless of opinion, it’s an interesting question: Is archiving even possible in a digital world? If your first thought is ‘of course, it’s just a bunch of bits’, stay with me for a bit. It gets complicated – and interesting.

Before discussing the sanity of the idea, let’s refresh where we’re coming from: Arching is a 500 years old practice for a vastly different (and analog) world. Actually it’s even older than that, see Can you Trust Your (Data) Broker?

Fast forward to 2023, the objects we’d like to archive are all but gone. Digitized versions of physical objects – sure, no problem, and early on in the digital age those were the rule. Now they are the exception. Digital ‘objects’ such as a web page, a Twitter post, a Facebook rant, an e-book, even a relatively simple presentation, they all have dependencies – to other objects, to tools, to presentation platforms, to the device and more. And increasingly, to licenses, intellectual property rights and privacy considerations – to mention a few. A single broken dependency changes the object’s value, possibly to zero. A reminder that archiving implicitly assumes value – while most data objects have little or no value in themselves. 

If that sounds complicated it’s because it is. Many questions, few answers – possibly the most important being ‘is there value’? And since it’s a hard question to answer, should we just store everything and hope we’ll figure it out eventually? Just in case?

Let’s be realistic:

  • There is too much data: The rate at which we produce new data – social media, news, industrial processes, research, space exploration, entertainment, technology – continues to increase at a rate beyond comprehension. Numbers with so many zeros you’d think someone has a stuck zero-key on their keyboard. Even with ChatGPT and Quantum Computing we’d have zero chance to make sense of it all.
  • Physically impossible: Archiving even a fraction of the data is physically impossible.
  • There is no obvious value: Even if it were possible, we’d have a hard time getting funding for such archive given that the dependencies required to make most data types valuable will be broken in a month, a year, maybe 10. Zero value. Or – approached from a different angle – the cost involved in recreating the links that make the data useable and valuable, is prohibitive.

But wait – there are plenty of archiving services out there. What are they doing? Is it useless? No, it’s not, but they are oriented backwards, not forwards – and they are running out of time. 

Here’s the thing: They deal mostly with digitized physical objects – books, documents, letters, PDFs, individual and complete pieces of music, movies, video, software, blueprints and so on. Discrete objects that are easily archivable – by the National Archives (or whatever their local name is), enthusiasts and organizations such as Wikipedia, The Internet Archive, WinWorld and a variety of other archives.

With the exception of Wikipedia and other encyclopedia-style archives, these services archive individual objects with sufficient metadata to identify and make them useful. That’s where traditional archiving stops. Even eBooks are hard to archive in a meaningful way because they have dependencies as discussed before – they are not complete in themselves. Think about it – how do we archive that? Even dumping the whole shebang may not be sufficient because internal links may have external dependencies or naming dependencies that evade the archiving process and break when restored in a different setting. This is happening every day and all over when companies, governments, institutions, hospitals etc. migrate from old to new solutions (see The world’s Greatest Challenge). More or less hidden dependencies that break things – big time.

It gets worse, but the point has already been made. The historians and researchers are chasing a pipe dream. Since the arrival of the web, in particular ‘dynamic web pages’, digital content has not been archivable in a classic sense – check out The Wayback Machine for a practical demo: Many, maybe most content types – sporadically collected – may be there, but the design and most of the user interface elements (‘the programs in the page’ if you like) are gone. The pages may display but not ‘work’.

If we want to archive such content in a meaningful way, we need a new approach – to rethink the concept of archiving. But first and foremost we need to evaluate the purpose and meaningfulness of such archiving. Do the why – what – how – who (pays) excercise, then act. Think SnapChat: Content, code, interfaces, designs, data, metadata, products – ephemeral by design, 10 years down the line? No thank you. Let them rest in peace.

Then there is the emotional issue: It makes most of us feel good to know that our work is being saved for future generations, but seriously – for what purpose? Rule 1: Let it go. Rule 2: With all due respect for my own work for 30+ years and that of millions of others: It had value then, not now. Most of it. And if there were glimpses of timelessness here and there, they will drown anyway. Rule 1 applies.

Which brings me back to where we started: Even the thought of saving all the garbage being poured into social media these days and for the past years, is kind of sickening. Why would we want anyone in the future – near or distant – to discover and try to make sense of all this junk? Filter it? How? As the last few years have demonstrated again and again, we’re incapable of filtering (aka moderating) the garbage in real time. Filtering retrospectively is even harder, not the least because agreeing on what’s junk and what’s not is impossible.

I know this sounds negative, but that’s how disconnected from reality the idea of broad digital content and social media archival is – regularly brought up by researchers and pundits as reported by Wired Magazine a few months ago. It’s not possible. And to me, that’s good news.

Bottom line: Classic archiving doesn’t make sense anymore – not the idea, not the data, not the value. So let’s leave the ‘archive forever, for future generations’ idea behind and spend time trying to make our present world make sense instead. By reorienting our collective and individual mindsets towards survival instead of voluntary annihilation.

Then we’re making sense. And boy, do we have work to do…

Leave a Reply

G-YEJJDB2X5L