Having been on the web since the beginning, I really wonder how much of this data really needs to be saved. Humans have survived until now without the need to keep a permanent journal of every little detail of every persons life. We do have historical luminaries whose works we save, but the harsh fact is that most people just aren’t special enough where we need to keep everything they generate.
We are now seeing that having old stuff resurface is largely detrimental. Back when it was just some photos in a shoebox, or a paper journal, the worst that could happen to your silly teenage ramblings was some embarrassment that you could quickly move on from. Now you face potentially losing your whole career and family because of some stupid thing you said one time when you were in a bad mood.
One of the most important functions of the brain is forgetting. If you were forced to remember everything, especially all the dumb stuff you did, your mind would be perpetually cluttered and you could never progress and grow. You only remember the significant stuff, and let everything else go.
The slow degradation of old sites is the Internet’s form of cleanup and forgetting, and it’s a natural part of the lifecycle.
From the perspective of a historian or archaeologist, it is a shame that things are forgotten. If the data that is being saved is not tied to your personal name then I would prefer that it is archived forever. Not only for the purposes of analysis when you are long dead and forgotten, but even within our lifetime.
Sometimes when I am in a nostalgic mood, I will revisit forums that I once frequented almost 20 years ago (surprisingly, the owner has kept it alive all these years). I love looking at those old posts, reflecting back on how things were at that point in my life. Reflecting back on the community members and my long lost virtual friends, wondering where they are now. They could be dead for all I know. I am glad that it has been archived.
In contrast, I have a side project that I work on occasionally. It is a plugin for a legacy game. The hacking community of this game has long since moved on. All of the golden knowledge of the game, the technical details, have vanished. All the forum posts, all the contributions, are forever gone. The hours upon hours of reverse-engineering efforts are down the drain. Not to mention all the social interactions within the community; the drama, the humour, the conversations. That is sad. I sure wish it was archived.
You are absolutely right. My mistake, poor choice of words. I was speaking to the loss of public information that came from hard work from many individuals. It's like the loss of many academic papers and studies. Gotta start back from square one.
> We do have historical luminaries whose works we save, but the harsh fact is that most people just aren’t special enough where we need to keep everything they generate.
Historians want stuff from ordinary people, because that stuff, though sometimes lame and embarrassing, gives a glimpse into daily life that the papers of more prominent people might not reveal.
History has long since ceased to be merely the study of important kings, queens, writers, etc. Historians want documentation of what life was like for everyone else, and that is why they study old random finds like people’s diaries and correspondence. Now that humanity has moved into the digital era, obviously people’s old websites will become an object of study.
Brings to mind Samuel Pepys, who would be a somewhat obscure historical figure, except that he happened to keep a detailed diary for a long time. For this, he is almost a household name, simply because he wrote down the things that were happening around him.
Not only a natural part of the lifecycle, but in the mid-90's the common wisdom was that the web succeeded where others had failed because previous hypertext systems had attempted persistence, but Sir (as he is now) TimBL bravely 404'd instead.
"... near the Tannhäuser Gateway. All those hrefs will be 404, like tears in rain"
There has been so many times I have used archive.org to find some old information that I needed, or checked to see how a page had changed (sometimes because of a legal issue). It's been invaluable. Sometimes archive.org doesn't work for it, and it's a real shame.
What if all the stored information occasionally proved a historical 'conservative' right on some issue? Maybe there really were some aspects of life that were better way back when. We might want to figure out what the tradeoffs are. Is everyone being obese a necessary consequence of modernity?
Things are more complicated than you think, and the conservatives are people too. It seems profoundly illiberal to discount their suffering (their lived experience) just because they seem wrong-headed.
We are lucky that we have the Internet Archive to capture and serve content for the public and give us access to some of these old sites that have gone dark. I wish that I had done these things with some of the private forums that I belonged to as a kid. Many important articles and conversations are gone now that I didn't care to think to save them at those times.
It is nice to see that old monikers of mine aren't indexed and easy to find, and yet it's also sad because I realize that much of that content I perceive as reflections of my younger self and thoughts which I know I cannot imagine again today. When I recall those memories I only recall my perception of events: words that were said; actions taken by others; the consequences of my actions. Yet, I can't repeat those exact thoughts because my identity has changed since then.
I identify with the Zillenial term. I also grew up on Linux and Windows due to my dad's interests. That is one of the acts that I am so grateful of my dad for, because it led me to where I am today.
For me, the web has been an important pillar of my life as it's given me so many sources of information that have granted me knowledge, given opportunities to read deep thoughts and niche content, and improved my principles and views on life. I'm sure many people on HN can relate to that statement, young and old alike.
I know exactly the sentiment. I think part of it was that growing up I truly believed the content I put on the internet would stick around. I didn't understand why things needed to be archived for posterity—the sites that were seemed to deserve such a fate anyway. I remember having so many interesting and free thinking discussions with members of various forums (computer repair, graphics editing tutorials, shurtugal to name a few genres). I honestly thought facebook would be the same, but with more familiar faces to names. Oh how I was wrong. Alas, Facebook was new web.
> I think part of it was that growing up I truly believed the content I put on the internet would stick around. I didn't understand why things needed to be archived for posterity—the sites that were seemed to deserve such a fate anyway.
Maybe this is where a lot of us fall short. Free and easily available content is easy to take for granted since there is an abundance of it, like the discussions here on HN. So we ignore capturing it with an underlying assumption that it will be available again in the future.
>And a social network doesn’t have to vanish for you to lose access to what’s there. The owner of a piece of content may lock their profile for any number of reasons and you or your readers won’t be able to see it ever again.
I agree with the overall point of this article and on balance, I think it's a net loss for the web and our shared web culture when content disappears like this. However, I do think there is something to be said for a web that "forgets" my angsty Livejournal posts, my poorly written and frankly obscene fanfiction, and other things of that nature.
We need to make it easier for people to control a domain name, and host content with HTTPS. You should be able to migrate your backend from provider to provider or open source project without losing control over the URLs. All individual content should be behind a personal domain name.
Here's a product I would love to pay for: A service that sells domain names, but also makes it easy to use them. It should provide a tool like ngrok that I can run on my local machine like this:
not-ngrok host anderspitman.net 8080
That contacts the service, has you log in, then (on first run) uses Let's Encrypt to get a cert for anderspitman.net, and sets up a WireGuard tunnel so any server I run on localhost:8080 is now accessible via https://anderspitman.net.
This lets me host whatever I want from any machine I want, even my phone, without ever having to mess with or think about certs again.
There used to be a service exactly for this, but pre-HTTPS. It was free using their TLD and a named subdomain, but you could also use your own domain name, which was probably a charged service. Sadly, I can't remember the name.
Part of my motivation in commenting was hoping it already exists. There's not much to it. The key is that it has to be very vertically integrated. I need to be able to buy (or port), manage, and use the domain all from the same simple interface.
CloudFlare does a lot of this, but you need a public server for them to proxy. They seem well positioned to add a CLI tool for the last mile ngrok-like part though.
That's obviously a related idea, but I have concerns:
* Complex. What if I don't need a CDN or kv store?
* Why just static sites? Why not just forward to a local port then you can run whatever you want, including a static server. One huge advantage of being able to run on your local machine is you have access to terabytes of storage at very low cost. You can host as much images, videos, music, etc as you want, especially if it's behind a simple login (like a NextCloud server).
DuckDNS looks like a nice API for the DNS part, but you still have to set up port forwarding and manage certbot.
Basically it's still an imperative process. What I'm talking about would be more declarative. You run a command that says, "I want https://anderspitman.net to forward to localhost:9001" and the system figures out everything necessary to make it work.
The URL structure of my blog has changed so much over the years and I have subjected my subscribers to so many 404s that it made me put everything in a self-hosted pastebin for peace of mind. All I have to do is keep that pastebin alive and the domain renewed and my content will be preserved well into the future.
I've still yet to find a registrar that promises to keep a domain renewed past the 10-year limit though. The closest service I found is Mark Monitor but that seems to cater for large companies / corporations who need to protect their intellectual property (and it's expensive!). Can't have someone registering apple.com when it expires!
I was wondering about an old WordPress just this week, related to the post content. In the past I managed to migrate a very simple WP to static pages with Pelican but I have a pretty complicated WP still running and it is a museum of lots of trips I did many years ago and I decided to keep it up due to the some rare info there (it's a RTW travel kind of site). I wish there was an easy way to keep the style, theme etc so I could get rid of WP for good and still let the site get old but still be there with its essence without damaging the visuals. I can't help but imagine the despair of people who would instead use a commercial product to create or host something like that, given even WP is hard enough when things get old.
...as for the site migrated to static pages, some posts are 15 years old, most of them are 10 and the most recent one is 5 so because the web is getting old I had to add a big disclaimer with a red background in every post stating things change, people change, the content and its form may change too so "please be gentle when interpreting it, the web is getting old" :-)
Right to be forgotten should be the default but I wish we also had the opt-in right to be remembered.
Dear author, if you're reading this: please don't write such a superficial entry. It doesn't actually bring any resolution to the "I get things are bad, but what are we doing to fix it?" /Tomorrowland/ problem, and it could.
Link to static generators that can import WordPress. Link to `wget --mirror` tutorials to archive. Link to https://www.archiveteam.org/ so people could participate. Mention the Web Archive, point to IPFS, and to DAT.
SHOW SOLUTIONS. Lamenting won't fix it.
Personal story: yes, dynamic sites suck. My 2002 archives are static HTML; no problem getting them. 2004 is a PHP4 CMS, I needed a Debian Sarge to run it. 2007 is FUBAR character encoding mismatch between server, HTML, any DB, plus PHP < 5.3 with MySQL > 5.6. 2010 and up is WordPress, but if you don't save it all - MySQL same version, WordPress same version, all plugins the same version, etc - it's hard work to get it back.
The site is only marginally aimed at developers, it is mostly for people in communication or marketing fields. Some of the ones I know are not even aware of the problem of content decay (lets call it that for now).
It is also about the mistakes the web community made while building the sites we use daily. Not assigning any blame, the web will always be uncharted territory.
Most of all, I wanted to write down that we saw good things being built. There are a lot of good reasons to keep moving forward.
Regarding tools, and as side note, that blog is running on https://goHugo.io , a static site generator, and uses the self hosted version of https://commento.io to manage comments. It’s my way of trying to make it future-proof.