A team of researchers and skilled web archivers is working diligently leading up to the President-elect Donald Trump’s inauguration to preserve web content from the Obama Administration.
A group of institutions, composed of the Library of Congress, the Internet Archive in California, the University of North Texas and more, has been working since June of 2016 to harvest the content, which may disappear from government web pages, on the End of Term Web Archive.
Researchers from each institution spend countless hours archiving internet sites and content that pertain to the federal government during each presidential terms. Since they started in 2008, the database has collected more than 160 million documents harvested from 3,300 government websites, mainly ones that end in .gov and .mil.
“When there is a transition of power and someone new comes into office, we forget that government websites change as well,” UNT Digital Libraries associate dean Mark Phillips said. “The way the new administration will communicate to citizens through the web will be different than the previous administration."
Scientist and researchers fear that, without their work, each administration's content via the web would disappear. Phillips has found that 83 percent of PDF files on the .gov domain in 2008 were missing from the web four years later.
“If we weren’t doing this, no one would,” Phillips said.
The Internet Archive says it has, on its own, preserved more than 3.5 billion web pages and more than 45 million PDFs from the .gov domain over its history. Pages can be removed from what the archive calls the "gov web" for any number of reasons; they may be taken off the web entirely or folded into other domains.
"These sites include significant amounts of publicly-funded federal research, data, projects and reporting that may only exist or be published on the web. This is tremendously important historical information, Jefferson Bailey, the Internet Archive’s director of Web Archiving, said in a statement.
The team of institutions has grown to manage the increase in social media content online. George Washington University is handling the social media effort by archiving the over 8 thousand Twitter accounts for government agencies, projects, programs and elected officials.
Online: End of Term Web Archive