April 16, 2013

UK libraries begin archiving websites and other digital content


A regulation came quietly into play last week in the UK that has long-term implications for the preservation of digital content.

It is essentially an extension of the principle called “legal deposit” under which a number of designated libraries have the right to receive copies of every print publication brought out in the UK, in the interests of preserving this material for future readers and researchers. In 2003, the Legal Deposit Libraries Act, which covered e-books, CD-ROMs, and websites that specifically addressed digital content, was introduced.

Over the past ten years, the Department for Culture, Media and Sport and the Joint Committee on Legal Deposit have been working out the practical issues involving in archiving and making available everything from a little girl’s blog about school meals to the verbatim transcripts of the proceedings of Parliament. The British Library, the National Library of Scotland, the National Library of Wales, the Bodleian Libraries, Cambridge University Library, and Trinity College Dublin will all participate.

Initially, the content harvesting will be infrequent—some sites are only due to be captured once a year—but the intention is to ramp up capturing very quickly, especially for sites that are updated frequently, or around important events or themes. This is also a national project, unlike broader attempts to capture the web in all its webbiness, like Brewster Kahle’s Internet Archive: they’ll only be preserving websites with the .uk suffix, though in time, an article in the Associated Press reports, they also hope to gather information from “sites published in other countries with significant British content, as well as Twitter streams and other social media feeds from prominent Britons.”

This represents a significant commitment to preserving the UK’s digital activity, and will immediately enlarge the legal deposit collections many times over:

“We’ll be collecting in a single year what it took 300 years for us to collect in our newspaper archive,” which holds 750 million pages of newsprint, Lucie Burgess [the British Library’s head of content strategy] said.

It will also mean that questions about how to preserve digital content—in what formats, in how many copies and places, how often migrated between formats—will be worked out and coordinated between libraries in a way that could provide a model for other digital content preservation schemes.

Exactly how they sorted out the practical and legal obstacles to archiving all these petabytes of material is not immediately clear. The British Library press release contains this somewhat mind-bogglingly uninformative sentence:

They [the Joint Committee on Legal Deposit] established an agreed approach for the libraries to develop an efficient system for archiving digital publications, while avoiding an unreasonable burden for publishers and protecting the interests of rights-holders.

For the moment, the answer to the many legal questions the Deposit Act raises is the restriction of access: content will be viewable only at the legal deposit libraries, displayed on only one computer at a time, and it can’t be copied.

To start things off, curators from the participating libraries compiled a list of 100 websites that they consider essential reading, effectively creating a snapshot of British life and character in the early days of the 21st century. It is encouragingly broad and strange, including the aforementioned little girl’s food blog, NeverSeconds, a notoriously grumpy pub and grub guide, Beer in the Evening, a charity dedicated to preserving rambling, the digital newspaper of the Occupy movement, a website on trade union history, The Union Makes Us Strong, and a blog for the community of gamers who continue to play the discontinued Sega hit Dreamcast, The Dreamcast Junkyard. Not to mention Amazon, Twitter, Facebook, the BBC, and other familiar sites.

The full list is available here.








Sal Robinson is a former Melville House editor. She's also the co-founder of the Bridge Series, a reading series focused on translation.