Use of Computational Analysis in the Digital Humanities

The field of digital humanities is unique because scholars attempt to use computational methods to shed light on research questions. Using digital tools and approaches to critically analyze data is relatively new in humanities studies. What sets the digital humanities apart from other quantitative disciplines, such as statistics or computer science, is the tendency to interrogate the uses of these methods and to complicate the insights revealed by computational analyses.

There are many modes of analysis that digital humanists employ, but, for this reflection, I want to focus on three: data visualizations, models and algorithms, and text analysis. Each of these transforms data in specific ways to help scholars make arguments, but these transformations cannot make meaning by themselves. Instead, modes of analysis reveal the human side of technology and, in the absence of critical explanation, conceal context and the complexity of reality. I now turn to a discussion of each of the three methods.

Data Visualizations

Visualizations can illuminate data trends in compelling ways, and they possess a rhetorical function. The choice of visualization is a rhetorical decision made by researchers to support their arguments. It nudges the viewer to think about the data in a certain way. For example, if a scholar chooses to present data in a line graph where a variable is plotted against years, we expect there to be a trend that changes over time. Likewise, if a scholar chooses to use a map, we might assume that there is a geographical significance to the data. Understanding what assumptions viewers might make when they are confronted by specific types of data visualizations can help scholars select the most effective presentation mode to reinforce their conclusions.

However, as helpful as visualizations can be, if their rhetorical purpose does not match the point the scholar is trying to make, they can also distract or detract from the argument. Continuing the map example, Matthew Ericson warns against using maps superfluously: “the impulse is since the data CAN be mapped, the best way to present the data MUST be a map,” but when you want to highlight trends that are not geographically significant, such as the outcomes of an election for a singular place over time, a map is not the most helpful visualization.1 A line graph would reinforce to the viewer that what you find interesting about the data is a temporal trend, not a geographical one.

The disadvantages of using visualizations as a mode of analysis stem from their inability to analyze data. Visualizations are descriptive. They are not necessarily analytical, and their significances are often ambiguous. Ericson emphasizes this point as well, noting that with maps, “Unless the pattern is super clearcut, trying to figure out how much of a relationship exists is a tricky task.”2 This can be extended to other types of visualizations. Users can infer the meaning of a trend, but it is imperative that digital humanities scholars include analytical discussions of data visualizations, which require knowledge and recognition of context. By themselves, visualizations are insufficient to make arguments.

Models and Algorithms

Unlike visualizations, models and algorithms are modes of analysis that move beyond description and toward prediction. To elaborate on what I mean by prediction, an example is helpful. Economists employ models and algorithms to predict and compute future outcomes depending on how variables, like unemployment rates or interest rates, might change. If unemployment were to increase, a model would show that X, Y, and Z would occur, and algorithms might compute the numerical values of those variables. Instead of merely describing what the unemployment rate is at a particular point in time, models and algorithms predict possible outcomes if that rate were to change.

But where digital humanists come in is when we recognize that models and algorithms reflect the human creator’s assumptions about how the data should behave. According to Benjamin M. Schmidt, “people design algorithms in order to automatically perform a given transformation.”3 We can apply his statement to modeling to claim that people use and design algorithms that make data fit within a model. We assume that the model is robust enough to make useful predictions. Our assumptions may be correct, but we must recognize that how something should behave does not mean that it will behave accordingly in real life.

An example of assumptions gone awry is brilliantly demonstrated by Safiya Umoja Noble. She has applied critical discourse analysis rooted in black feminist approaches to analyze the algorithms at work in Google search, specifically how search portrays a commodified and noxious perception of black girls. Users may assume that the results of a search query will be accurate and relevant, but, as Noble explains, Google’s algorithm relies on evaluating networks of hyperlinks to determine if a search result is relevant. A page is more “relevant” if it is more popular, which does not guarantee accuracy.4 If, to paraphrase Schmidt, we do not understand how an algorithm actually works and instead trust it to accomplish a stated goal,5 we might, in the case of a Google search, believe that highly offensive and exploitative results reflect reality rather than the assumptions made by humans that went into making the search algorithm.

Similarly, digital humanists must recognize the limitations in how a computer is trained to follow a model. In the domain of machine learning, people may assume that technology can to some extent think for itself. But models are trained on corpora of data that are selected by humans, which in turn reveals the assumptions and biases held by those humans. Additionally, computers cannot think for themselves; they make predictions according to statistical measures calculated from the training data. More training data may make the computer more accurate within that narrow domain with which it is familiar, but if a user asks a computer to predict something that it has not been trained on, it will return unreliable and perhaps undefined results. When the research question concerns identity production, such as religious classification, untrustworthy results can prove damaging. Computers cannot think creatively, consider context, and apply knowledge to new information in the way that humans can. Digital humanities scholars understand the limitation that computers and algorithms reflect only the data they have been fed with no regard for meaning.

Text Analysis

Text analysis is a more specific form of computational analysis than the general categories of visualizations, models, and algorithms, but my technical work reveals an interesting tension between distant and close reading.

I used the text analysis tool Voyant to discern trends in a corpus of stories about COVID-19 experiences. I wanted to know if, among the 42 entries from members of a specific community, there was a dominant mood. Voyant revealed that words such as “people,” “community,” “spirit,” and “family” appeared most frequently. By using a distant reading approach, I could assume that the writers felt community solidarity as residents faced a common enemy.

However, just because these positive-sounding words showed up most frequently does not mean that the messages actually read in a positive manner. To validate the text analysis results, I had to read the individual entries closely, use my human brain to figure out the mood of each entry by analyzing context, and categorize the entries as positive-leaning or negative-leaning. It turned out that the entries did reflect the more positive mood suggested by Voyant, but it would be too simplistic to assume that the tool would automatically reflect the context and reality of the entries. Text analysis can reveal suggestive trends, but as with all modes of computational analysis, it should not be relied upon to convey reality or accuracy. Distant reading must be accompanied by close reading.

Conclusion

The modes of computational analysis that scholars use are rhetorical tools that influence their arguments’ effectiveness. Digital methods cannot be cleanly separated from the human side involved in research, analysis, and presentation of data. Computation is a human endeavor: humans determine the research questions, write code, select the data upon which to train algorithms and models, interpret and find significances of results, and present the analytical conclusions in particular ways to make points. In religious studies, arguments and claims often deal with sensitive issues of identity and classification, which require knowledge of context, so the ramifications of using technological methods can be much more complicated than computers might suggest. Computational analysis can be helpful when used thoughtfully, but it can’t be the endgame of research.

Notes

  1. Matthew Ericson, “When Maps Shouldn’t Be Maps,” Matthew Ericson (blog), October 14, 2011, http://www.ericson.net/content/2011/10/when-maps-shouldnt-be-maps/.
  2. Ericson, “When Maps Shouldn’t Be Maps,” sec. “2. When the geographic data is more effective for analysis.”
  3. Benjamin M. Schmidt, “Do Digital Humanists Need to Understand Algorithms?” in Debates in the Digital Humanities 2016, ed. Matthew K. Gold and Lauren F. Klein (Minneapolis: University of Minnesota Press, 2016), par. 5, https://dhdebates.gc.cuny.edu/read/untitled/section/557c453b-4abb-48ce-8c38-a77e24d3f0bd#ch48.
  4. Safiya Umoja Noble, “Google Search: Hyper-visibility as a Means of Rendering Black Women and Girls Invisible,” InVisible Culture: An Electronic Journal for Visual Culture 19 (2013): sec. “Introduction,” accessed November 23, 2020, http://ivc.lib.rochester.edu/google-search-hyper-visibility-as-a-means-of-rendering-black-women-and-girls-invisible/.
  5. Schmidt, “Do Digital Humanists Need to Understand Algorithms?” par. 5.

Leave a Reply

Your email address will not be published. Required fields are marked *