Initial Foray into Topic Modeling for Rhetoric and Composition

I’m just getting into topic modeling as a research method, thanks to my husband, Jonathan Goodwin. This post represents my first attempt to make sense of it, because its value hasn’t been immediately comprehensible to me. You take a huge corpus of thousands of academic articles (or whatever), and then run a program, which extracts and presents you with groups of words that tend to occur together in the articles (topics).

Jonathan took all of JSTOR’s archives of College English, CCC, Rhetoric Review, Rhetoric Society Quarterly, and JAC and generated 100 topics. Some were coherent, while others seemed more random – though also somewhat interesting; see postscript. Here’s an example of a coherent one:

lectures belles century lettres historians rise influential reform hugh british campbell scottish southern rhetorical blair late alexander england founded

The program will show you a long list of articles that are associated with the topic. So if you were determined to find absolutely everything having to do with Hugh Blair, this method might give you some articles you wouldn’t have found via a regular JSTOR search.

OK, but besides that, what do you DO with the topics? That has always been the confusing question for me, though I have read some work about topic modeling. I want to write a series of posts explaining what I’ve been doing with the topics, to sort it out for myself.

When Jonathan sent me the list of 100 topics, I went through all of them and selected 53 that I thought were interesting. Mostly these were the ones that were most coherent, like the Blair example. I then pasted them into a document, labeled each one, and grouped them together, like so:

History of Rhetoric

classical cicero ancient rhetoric greek roman oratory orator eloquence quintilian invention renaissance speaking aristotle orators rhetoricians vols modem history

plato sophists gorgias socrates sophistic greek phaedrus ancient platonic greece athenian greeks athens sophist protagoras logos dialogues carolina isocrates

lectures belles century lettres historians rise influential reform hugh british campbell scottish southern rhetorical blair late alexander england founded

philosophical philosophy truth logic philosopher rational doctrine philosophers theory aristotle essays thing writings mere science thinking human mind truths

rhetoric rhetorical persuasion rhetoricians kenneth communication speech burke aristotle classical audience argumentation philosophy discourse persuasive arguments quarterly speaker invention

Now, I haven’t yet looked at the lists of articles associated with these topics, but here’s a list of questions that might be answered by giving these lists a close review:

How has the interest in these topics changed over time? This seems to be a favored approach among nerds like my husband – visualizations: graphs that plot the trajectory of when people started becoming interested in the topic, when interest peaked, and when it waned. Below is an obligatory graph for the Hugh Blair topic:

graph showing interest in Hugh Blair and George Campbell in rhetoric and composition journals, late 1930s-late 2000s

Another question I’ve never heard anyone ask, though, is this: which journals are publishing the most on these topics? Most of us in rhetoric and composition assume that if you have a manuscript about Cicero, you send it to Rhetoric Society Quarterly or Rhetoric Review, not to College English, because they don’t publish that kind of thing, generally speaking. But are we sure about that? To what extent? I can now do a frequency count of the journals that are represented in the “classical cicero ancient” topic and make a pie chart showing the breakdown. More on that later.

And Now for Expressivism

For six of the topics, I wasn’t sure what they meant, but I went ahead and labeled them “Expressivism.” Because expressivism doesn’t have an attendant set of terms, and is misunderstood so often and so profoundly as a theory of writing and teaching philosophy, I was interested in seeing what kinds of articles were listed. As it turns out, most of them were actually about literature, or were works of creative writing. But one of them yielded a few interesting bits:

obvious hard easily worth expect surely aware mere doubt simple easy leave idea supposed vague avoid bad respect clear

Most of those, of course, are commonly used words, and I’m not really convinced that this is an “expressivist” topic. Still, I did find these articles, among many others:

Richard K. Redfern, "A Brief Lexicon of Jargon: For Those Who Want to Speak and Write Verbosely and Vaguely", College English, 1967
Donald Murray, “Henry James in the Advanced Composition Course,” 1963 College English
Peter Elbow, "Exploring My Teaching", College English, 1971
Joseph J. Firebaugh, "On being Unacademic", College English, 1946
Geraldine Hammond, "How Gladly Do We Teach?", College English, 1951
Winfield H. Rogers, "Responsibilities of the English Teacher in the Urban University" 1940 College English
J. Mitchell Morse, "Why Write like a College Graduate?" College English, 1970
J. Mitchell Morse, "The Case for Irrelevance", College English, 1968

In reviewing the list of articles associated with this topic, I think I now have a better handle on the sources of the theory some of us call expressivism: certainly it arises from attitudes of respect, concern, and care for students (see Rogers, 1940; Hammond, 1951). But I now see the overlap between expressivist values and the study of literature, as well as the study and practice of creative writing, and descriptive approaches to linguistics*. Those connections should have been obvious, but they weren’t. I can see that as early as 1940, some of the ideas associated with expressivism were in circulation – though certainly some would say that the seeds were planted even earlier, with Fred Newton Scott’s work (see Linda Adler-Kassner’s excellent article “Ownership Revisited” in CCC, 1998). I also think there are many more expressivists than we'd realized, and that a lot of people are/were expressivists but don't/didn't know it. No one is old enough to have witnessed all of this, comprehensively, in real time, and topic modeling is almost like having such a person.

More to come. For now, I’ll say that the best way to grasp the value of topic modeling as a method is to focus on one topic and mine the articles.

* I’m prepared to argue that James Sledd was an expressivist; I think I have a good bit of evidence.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

stop words

Interesting, especially with the more common words -- how does that array of words align with, say, Derek Mueller's list of "stop" words (themselves extended, I think, from the work he and Collin did with CCCOA)?


The only words the topic and

The only words the topic and Derek's stop words list have in common are "clear" and "sure" ("sure" is on the stop words list; "surely" is in the topic). The corpus hasn't been lemmatized.

We made a list of stop words too; Jonathan has it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.