A Scanner and a Mission: An Interview with Paul Ford
Hewlett-Packard made a big fuss about putting 80 years of Time magazine online. It took HP Labs and a consulting team a year to do it, and the online archive is still limited: it provides the text of the articles but not the actual images of the pages. Other periodicals, including magazines like The New Yorker and newspapers like the New York Times, have put their content online but have stopped short of providing reproductions of the very pages as they were laid out in the original print editions. (The New Yorker archives are only available on DVD or portable harddrive.)
Magazines and newspapers are visual creatures that provide insights into their cultural moment. For designers and all those who love to study the visual histories of periodicals, reading text without seeing the page is like reading a screenplay without seeing the movie. Harper's Magazine, on the other hand, has gone all the way. It has created what associate editor Paul Ford calls "a massive, interlinked, searchable document that provides quick access" to 157 continuous years of Harper's—with illustrations and all. Working alone, without any consulting team and without fuss, Ford has been a man with a plan, a scanner and a lot of patience.
"Autumn Fashions" from Harper's first issue, June 1850.
Barringer: I understand scanning the Harper's archives was your crazy idea. Why did this occur to you?
Ford:Harper's maintains an extraordinary index of every item ever published in the magazine, from the first issue in June 1850 through today. It took many people many years to create the index, but hardly anyone was using it. When I came to Harper's in February 2005 and began to explore the index, I realized we were 80 percent towards an online archive. We just needed to scan the issues and align the scans to the data.
There were other influences: Cornell University, as part of the "Making of America" project, had already scanned in everything in Harper's before 1900, and their website was a pleasure to use. National Geographic and The New Yorker had released their archives, proving that there is an audience. And copyright law has made it clear that Harper's has the right to publish images of its own pages in their entirety.
Barringer: At the time when you had the idea, what was your role at Harper's? I seem to recall that you designed their website, then you were writing the Harper's Weekly newsletter, and then you were suddenly updating their computer systems. How did your relationship evolve?
Ford: Roger Hodge, who is now the editor of Harper's Magazine, got in touch. He liked some of my work on the web. Under his guidance I created a new website at harpers.org that did some interesting things with content. It structured things in unusual ways, cut up articles and rearranged them. Roger maintained the website, but as his responsibilities increased at Harper's, he asked me to come in full-time.
On arrival I took over the Harper's Weekly Review, an email/web newsletter that Roger created in 2000. The Weekly summarizes the news in as cruel a manner as possible. I stopped writing it a year later in order to focus on building a new website for Harper's and on editing Washington Babylon, a weblog written by our Washington editor, Ken Silverstein.
I also manage IT in the office. I order computers and set up a server here and there. I let our outside support firm handle the difficult problems, and I do crisis control when computers die or harddrives crash. On a typical day, I write a Java class, hack some Perl or XSLT, review some scans, edit and post a blog piece, and order a computer.
From poetry to articles on "cracker cowboys" and house-boats in China, from Harper's June 1895 issue.
Barringer: How did you convince others that the archive would be a good idea?
Ford: I identified a partner who could resell the archive to institutions, college libraries, mostly. That provided a projected revenue source that justified the investment in scanning. There are also other revenue sources that naturally follow from bringing the magazine fully to the web: increased advertising, more subscribers, and so forth.
But, look—Harper's is a great magazine. More people should read it. I believed that before I started here, and I believe it now. Getting people to agree to this project was not hard, because people want the magazine to be read and discussed.
Barringer: Who exactly at Harper's did you have to persuade?
Ford:Harper's is a small place where people work closely together, so I can't say that I had to persuade anyone. Over several months, I simply talked about what would be involved in scanning until things fell into place. John R. MacArthur, the publisher, and Lynn Carlson, the general manager, required me to justify the project before we spent any money. But it wasn't for lack of enthusiasm; they just wanted to make sure that this was something that we could do in a reasonable amount of time, for a reasonable amount of money.
Cover of Harper's March 1969 issue.
Barringer: How did you first get started on the project? I mean, what did you do, literally, first? Did you even have an office? Or a scanner?
Ford: I did a great deal of hacking in Perl and Java to create a working prototype website. I bought a cheap scanner so that I could figure out how to connect scans to the database. After a tremendous amount of research, we bought a Fujitsu 5750C scanner, which is a wonderful piece of equipment that delivers quality color scans at 600dpi, sheet-fed.
Barringer: Did you have to hunt down the archival issues?
Ford: I was able to make a deal with Bennington College, thanks to Library Director Oceana Wilson; they gave us their back issues in exchange for online access. An undergrad drove the volumes down one day—dozens of heavy boxes.
Having a full spare copy of the archive meant we could cut the spines of the Bennington volumes and feed sheets to the scanner instead of manually scanning each page. This saved thousands of hours.
Barringer: Did the hours of scanning that turned into weeks and months ever deter you? What did you tell yourself to keep at it?
Ford: I knew when I started that this was a hard project and that it would take a great deal of time. But the entire run of one of the great world periodicals will be available to anyone who wants a subscription... to dig around inside that archive, analyze it and use it to see how ideas have evolved over the last 157 years.
But I'll need to start again because, within a few years, all of the 200dpi PDFs and 1000-pixel-wide color-compressed GIFs that I've created will seem small and cramped. Screen resolution and more bandwidth will require me to upgrade the entire archive. OCR [optical character resolution] will improve, making it possible to analyze pages with more accuracy and thus improving the quality of searches. Improvements in semantic web technology—[since] the site is built on a semantic web framework—will allow for the site to be better organized and for more complex queries to be made. I've built the system to allow for continuous upgrades of this sort, over the coming decades.
Cover of Harper's June 2007 issue.
Barringer: Did the editors at Harper's ever express wonder at the scope of what they were letting you do? Or were they unaware of all the computer stuff involved and give you free rein?
Ford: It's obviously a lot for one person working alone to bring hundreds of thousands of pages online while writing, editing blog content, programming a complex, semantic web-driven site, and providing tech support for an office. Everyone recognizes that. I've been able to get some help with database programming and in quality assurance, and that's been terrific.
Creating this archive is certainly the hardest thing I've ever done—much harder than writing a novel, for instance. The trade for that work is that I have learned a great deal: about programming, about editing, about American history, about changing styles in prose and art, about typography, about the pagination of magazines in the 1920s.
Barringer: What was the originally imagined scope, and what is now the actual scope of this project—in terms of both your labor and the vision of the thing?
Ford: What I have built is remarkably close to my vision: a massive, interlinked, searchable document that provides quick access to 157 continuous years of Harper's Magazine—something that will help researchers, appeal to readers (and thus to advertisers), and that will, hopefully, provide relevance and context in a web that is filled with hour-old news.
It's very motivating to have such a vision, but ultimately, the archive will belong to the readers. My opinions will be much less important. I plan to listen to feedback and make alterations to the site until it is as useful as it can be to as many people as possible.