by Geoff Keelan
Our second day in the topic modeling workshop was a bit more grueling than our introduction, but much more enlightening. The morning was spent learning about the math and the foundation behind topic modeling from Julian Brooke. Math is not something that comes naturally to us humanists (or at least to me), but its value was quickly apparent. I finally understood what topic modeling programs were doing for us. Yesterday, numbers and words were just magically appearing after running my program. Today, they were appearing because of the probability that certain words are close together from a topic and the probability of how common each topic was in a document.
Understanding how a digital tool works is as important as yesterday’s lesson about why we should use them. Especially given the somewhat suspicious and ambiguous nature of our results. If you don’t understand what’s happening behind the scenes, it’s a lot less clear what your results means for your research. I found after the explanation from Julian that I better understood how I could use topic modeling for my project, which for this workshop specifically looks at Hansard Parliamentary Debates. Namely, the limitations with how my source was organized: I had a dozen files for each parliamentary session from 1914-1922, but I needed to divide those files into much smaller ones. I am hoping to split them by day and create a much larger (but still relatively small for DH!) body of sources. As the program models the probabilities of topics and words, having a large number of documents allows for better tracking of commonalities or outliers.
At least from what I understood. It’s a bit daunting to even repeat back what I have learned as it is so far outside of my traditional realm of knowledge. Still, this workshop has done a lot to demystify topic modeling and digital humanities as a whole, even if I don’t quite have a handle on it yet. This evening’s Keynote Lecture from Jennifer Roberts-Smith, “Your Mother is Not a Computer: Phenomenologies of the Human for Digital Humanities,” was also a bit outside of my area, but one comment from the Q&A stuck with me as we headed to dinner. Jennifer described digital humanities as a shifting field, but one that is increasingly looking towards describing what it means to be a human being in a technologically mediated age. As in, how does the digital age make us different from Shakespeare? This question is immensely interesting to historians because figuring out the influence of differing contexts (spatial, temporal, of identity, etc.) on individuals and communities is at the core of our discipline. While it’s much easier to find answers in the past with the benefit of hindsight, we have a skillset that allows us to at least wonder about it in the present. Perhaps, like topic modeling, it comes down to guessing at probabilities – but as we are learning, that has value if we know what to do with it.