Over the past few years, there has been a boom in the number of papers utilizing text as data.

For example, a recent paper by Koijen, Philipson and Uhlig uses SEC filings to measure healthcare firms’ exposure to regulation risk.

I taught a guest lecture last month on using Python to analyze large text datasets, with a focus on SEC filings. The slides are posted below, and I hope they are useful for people just getting into the subject.

A copy of the presentation can be found here (PDF)