Like most academics, part of my day is spent in meetings or workshops. This morning I had a meeting with the Japanese and Korean faculty, and next up is a crowdsourcing workshop. I have some ideas for crowdsourcing to add to the OCOJ, and I’m here to get more ideas.

Today for the international Day in Digital Humanities event, digital humanists have been asked to document what we do in a day. It’s a great idea, and I expect the comments will be as varied as our research.

My training is as a historical linguist. I am mainly interested in the development of the Japonic language family, which is the language family consisting of Japanese and Ryukyuan languages. I came to Oxford to work on the AHRC-funded Verb semantics and argument realization in pre-modern Japanese (VSARPJ) project, and in order to do the research for the project we soon realized we needed to build a corpus. I attended the Digital Humanities Summer School in Oxford in the summer of 2009, and that’s where I learned to do much of the necessary skills  for this kind of work, including how to use XML, XPath, XSLT, etc. (As it happens, I’ll be co-teaching the Text to Tech workshop in the Digital Humanities Summer School this year, and registration is still open.)

When I wrote my dissertation, a few years before coming here, I had to rely on dictionaries and the few available indices to get information about words attested in Old Japanese, the oldest stage of the Japanese language (8th century). Now that we have a corpus, the Oxford Corpus of Old Japanese (OCOJ), it’s so much easier, faster, and more accurate to get data about any given word.

In addition to the corpus, I am also working on the development of a bidirectional Old Japanese – English dictionary, making it possible to group words together by their meaning. It’s also possible to jump from the dictionary to examples in the corpus, and from the corpus to the dictionary. But more on that later.


Nothing like getting in to work and realising that they will be filming outside the office today.  Every now and then something is filmed around here, usually Lewis these days. That’s Oxford for you.

At least they will be filming safely today.


I like the term “Digital Humanist”. Sure, it confuses people when you introduce yourself as a digital humanist if they’ve never heard the term before (and many people I talk to haven’t), and I usually get the question: “So, what is it that you do exactly?”

It’s a great question. Not one that’s easily answered, of course, and I certainly don’t want to put people into a deep slumber if I try.

The short answer is that I use technology to access a dead language.

A longer answer is that I’ve spent the past 6 1/2 years working on the design and development of  The Oxford Corpus of Old Japanese (OCOJ;, which is a syntactically parsed corpus of all extant Old Japanese texts. The corpus is tagged in XML following the guidelines of the Text Encoding Initiative (TEI). Having a corpus like this drastically improves the way data can be accessed and analysed. There have already been a few dissertations and several articles written using this resource. I’m also involved with a few research groups who want to incorporate data from the OCOJ in their diachronic projects.

I should also mention that being able to do research in this way is really fun.

I’ll post several examples of the kinds of things that can be quickly examined using a corpus during the Day of Digital Humanities.

