Harvard Releases its Libraries’ Huge Data

Harvard is making public the information on more than 12 million books, videos, audio recordings, images, manuscripts, maps, and more things inside its 73 libraries.

Harvard can’t put the actual content of much of this material online, owing to intellectual property laws, but this so-called metadata of things like titles, publication or recording dates, book sizes or descriptions of what is in videos is also considered highly valuable.

Frequently descriptors of things like audio recordings are more valuable for search engines than the material itself.

Search engines frequently rely on metadata over content, particularly when it cannot easily be scanned and understood.

Harvard is hoping other libraries allow access to the metadata on their volumes, which could be the start of a large and unique repository of intellectual information.

“This is Big Data for books,” said David Weinberger, co-director of Harvard’s Library Lab. “There might be 100 different attributes for a single object.” At a one-day test run with 15 hackers working with information on 600,000 items, he said, people created things like visual timelines of when ideas became broadly published, maps showing locations of different items, and a “virtual stack” of related volumes garnered from various locations, as NY Times stated

Harvard plans also to eventually include circulation data on the items as well, said Stuart Shieber, director of Harvard’s Office for Scholarly Communication, who oversaw the project. “We have to be careful how we do that, to avoid releasing any personal information.”

Mr. Shieber said Harvard did not really know what would come of the release. “This data serves to link things together in ways that are difficult to predict,” he said. “The more information you release, the more you see people doing innovative things.”

The release follows Harvard’s decision, via the Office for Scholarly Communications to release much of the published research from its faculty free.

The metadata will be available for bulk download both from Harvard and from the Digital Public Library of America, which is an effort to create a national public library online.

 

Leave a comment