Skip to Main Content

Text Mining and Analysis

Rudimentary Text Mining Recipe

This is a system facilitating both "distant" and "scalable" reading. It is a system for doing rudimentary natural language processing and text mining. The process is similar from project to project, instance to instance, and corpora to corpora. In a nutshell, you:

  1. Articulate a research question
  2. Amass a corpora, and convert it into a set of plain text files
  3. Count & tabulate individual words and phrases
  4. Count & tabulate parts-of-speech & named entities
  5. Use topic modeling to identify possible themes
  6. Analyze the output of Steps #3, #4, and #5 to look for patterns & anomalies
  7. Use the results of Step #6 to do "scalable" reading; search & browse the corpus
  8. Address the research question, and repeat

Through the repeated use of the functions available here, a reader will empower themselves to use and understand a corpora at scale.