Programming the Day of DH 2015

Hello!  My name is Jessica Dussault and I am a programmer for the Center for Digital Research in the Humanities at the University of Nebraska-Lincoln.  I work with a designer / programmer, metadata expert, and faculty members at the university to create new projects and maintain existing ones.

Often, we focus on one large project at a time.  Currently, we are working on completing the Nebraska Newspapers project, which is a Nebraska-specific implementation of the Library of Congress’s Chronicling America webpage.  On this, the Day of DH, the newspapers have taken up most of my time, broken up by smaller tasks.

When I first got to work, I put on a grim face as I went to check the progress of my thumbnail generation for the newspapers.  Since we opted to use a free utility, graphicsmagick, the on-the-fly conversion of jp2s to jpgs takes a bit longer than we would like, so we are pregenerating thumbnails that will show up in search results and individual newspaper overviews.  One of the newspaper batches had put up a fuss about reading some of the page’s jp2s, but reuploading them and running my generation script overnight did the trick.  As of this morning, all the batches were in the database, a solr index, and the thumbnails were ready to go.  That’s a nice feeling.

Next I turned my attention to the TODO list that Karin, my boss, and I had drawn up with remaining items for the newspapers.  Most of the ones left for me to do were fairly minor.  Changing some dropdowns, adding a field to a search, making sure that jquery behavior was as expected, etc.

I did run into an interesting problem while trying to fix the capitalization of the newspaper titles.  The Library of Congress has a naming scheme that involves capitalizing locations in a title, but not much else.  That means you might get “The daily Nebraskan” or “The Omaha daily bee” which just really drives me bonkers.  I happily went through capitalizing everything until I got to the Czech language newspapers we have in our digital collection, at which point I hesitated because I didn’t want to capitalize the wrong thing.  I messaged a friend from the Czech Republic to ask her advice, and she not only gave me capitalization advice, but she also immediately caught a typo in the title of one of the papers.  I brought it to the attention of our metadata expert and she is going to let the LoC know about the typo so that it can hopefully one day be corrected.  It was also fun learning about the fairly artistic names of the Czech papers, which are not so straightforward as the “Such and Such Herald” but are more along the lines of “The Echo of the West.”

At some point, a faculty member called up to ask if we could help debug something that was going wrong on his webpage.  I had a nice mental vacation doing some testing and helping to figure out why the javascript files were not being found.  I was also approached by a student working for a faculty member to ask if I have time soon to give him a quick run-down of how to use git in a group environment.  I am always excited to talk about git to people.  Branches and merges and pull requests are my bread and butter.  Version control forever and always!

I am not much of a sysadmin, but some of the responsibility for tending to servers falls to me.  Fortunately for everybody, the newspapers server is operated out of the library’s personal server stash.  I contacted the kind library sysadmins today to see if we could get backups running, now that I have uploaded around 11 TB of files that will likely not be changing too much.  If something should happen to them before they are backed up, a part of my soul will wither and die.  My past weeks have been filled by hunting through network drives for scattered pieces of batches, stitching them together, uploading them for several hours a batch, generating thumbnails for them overnight, loading them into mysql and solr, and improving my tetris skills while waiting to see if the batch would fail within the first five or ten minutes (they seem to fail either nearly immediately or several hours in, of course).  Of course, who needs to worry about backups?  Nothing goes wrong with servers, right?

Cue a faculty member arriving to try to resuscitate a dead server.  A few weeks ago, the oldest server died and took with it some student projects.  It wasn’t backed up because nothing on there was exactly production data, but as we are going to retire the server rather than fix its broken parts, we figured we might as well retrieve what we can.  We hooked up a scavenged crash cart made of a discarded monitor, cables from the backs of our own machines, and a hard drive mostly full of newspaper tiffs, and started up the server in the conference room.  Nothing like the sound of a jet engine taking off to help you make friends in the quiet library, I reckon!  It has been running for a few hours but it is still thinking pretty hard about zipping up the data we want, so hopefully it will be done after running overnight.

All in all, it was a fairly standard day in the eclectic life of a DH programmer!