Who's saving the internet's history?

Jul 02, 2026

Imagine trying to pull up a CDC page on gender-based violence and adolescent HIV rates. The page was there in January 2025, prior to Trump’s inauguration. Now it reads “Page Not Found.” Or perhaps you need to fact-check a U.S. Supreme Court opinion but the link no longer leads to the source material.

These aren’t imaginary scenarios. Even before the current administration began scrubbing federal web pages, studies found that roughly 38% of web pages that existed in 2013 were already gone by 2023. In each case, the fallback has been the Internet Archive.

The Internet Archive has been around since 1996 and currently holds more than 866 billion web snapshots, millions of books, films, software programs, and audio recordings—all accessible to the public for free under its nonprofit status.

Its best-known service, the Wayback Machine, is embedded in everyday infrastructure more than many of us may realize.

For example, when Google retired its cache service in 2024, it redirected users to the Wayback Machine. Wikipedia relies on it to maintain millions of links that would otherwise lead nowhere. And as of today, the Internet Archive holds the only publicly accessible copy of the interactive January 6th congressional timeline, which was removed from the investigating committee’s own website.

All that to say, the Internet Archive is crucial to digital accountability and preservation, which makes current events surrounding it concerning.

WHAT'S HAPPENING IN THE FIELD

The concept of a “Digital Dark Age” evolved alongside digital archiving in the 1990s. It describes a possible future where vast amounts of digital information are lost due to deletion, data corruption, and the decay of systems built to store it.

In his 2024 book, Averting the Digital Dark Age, historian Ian Milligan lays out how archivists, librarians, and technologists have already built a long-lasting memory for the web—and how we can adapt with future shifts. The Internet Archive is central to that story, but its status hangs in a delicate balance.

How we got here

The pressure has compounded from multiple directions:

Between 2020 and 2025, the Internet Archive lost two major copyright lawsuits: one brought by publishing houses and one by the music industry. The result was millions in damages and the removal of thousands of books and audio recordings from public access.
In May 2024, a hacktivist group coordinated a cyberattack that took down the Internet Archive’s services for weeks, highlighting its foundational precarity.
Starting in 2025, federal agencies began removing web pages and datasets at an unprecedented pace. The Internet Archive cataloged more than 73,000 federal web pages that vanished after inauguration day. The Wayback Machine became the only place the public could still find many of them.
As of May 2026, more than 340 local news outlets have blocked the Internet Archive's web crawlers over AI scraping concerns, putting decades of local journalism at risk of disappearing from public record.

Other countries, including the UK, France, and Australia, fund government-backed internet archiving through legal mandates. In the United States, that responsibility falls to a nonprofit running on roughly $28 million a year.

The Internet Archive's designation as a Federal Depository Library last year was a step in the right direction, but not a long-term structural solution.

What's at stake

As it stands today, the Internet Archive is a single point of failure for a dizzying portion of the digital record. Google and Wikipedia rely on the Wayback Machine to maintain links that would otherwise lead to 404 error pages. There is no other fallback.

If it disappears, the losses would be permanent and devastating. At risk:

Trillions of web pages and cultural artifacts
Federal datasets and public health records
Legal and historical evidence
Local journalism archives
Software that no longer exists elsewhere

These are all important digital artifacts. In a future without the internet as we know it, what happens to these pieces of history?

Internet Archive founder Brewster Kahle and his team are doing what they can with limited resources: distributing copies across six independent data centers around the globe, building a peer-to-peer network that’s less vulnerable to censorship, and deepening ties to government archiving programs. In short, the Internet Archive’s plan to “live forever” relies on becoming too essential to fail.

And they’re doing all of this on a nonprofit budget while infrastructure costs driven by AI data consumption continue to climb. The Internet Archive needs outside advocacy, funding, and institutional support now more than ever.

WHAT WE CAN DO

Currently, a lot of that support comes from grassroots efforts and communities like r/DataHoarder that provide decentralized, volunteer-driven preservation. Digital preservation is something anyone can do, without requiring large financial or time commitments.

Here are a few ways you can join the effort:

Support the Internet Archive directly. Donate, submit content for archiving, and advocate for public funding. Write to your representatives. In California, for example, institutional advocacy and targeted infrastructure partnerships helped the Internet Archive secure state-level recognition as a public library.
Join or support the Data Rescue Project. This volunteer-driven collective coordinates rapid-response efforts to identify and preserve at-risk public data. Their FAQ outlines ways to contribute at any skill level, from metadata review to data downloads.
Build your own digital preservation strategy. Re-examine how you can preserve your own corner of the internet as part of the larger effort to decentralize and alleviate the single point of failure issue. Download articles (browser extensions like GoFullPage make it easy to capture and download entire web pages), submit content to archiving platforms, and advocate for digital preservation at both the institutional and individual levels.

CLOSING REFLECTIONS

In a relatively short timespan, the internet has reshaped how we live, work, and communicate.

We’ve built entire systems based on the assumption that the internet will last forever. The Internet Archive exists because a small group of people knew that wasn’t the case. Much, if not all, of our digital records exist because of them.

Digital preservation is an ongoing effort that belongs to institutions and individuals alike. We have an opportunity to shape the decisions that determine what future generations can access.

Let’s make the most of it.

This blog post content was originally included in our community newsletter: The Moment -- where we respond quickly and thoughtfully to impactful events and decisions that challenge or disrupt our profession.

Stay connected with news and updates!

Join our mailing list to receive the latest news and updates from our team.
Don't worry, your information will not be shared.

We hate SPAM. We will never sell your information, for any reason.

Who's saving the internet's history?

WHAT'S HAPPENING IN THE FIELD

WHAT WE CAN DO

CLOSING REFLECTIONS

Stay connected with news and updates!

Join Our Free Trial