Chapter 6

Latent Dirichlet Allocation

Extracting Topics from Software Engineering Data

Joshua Charles Campbell*; Abram Hindle*; Eleni Stroulia*    * Department of Computing Science, University of Alberta, Edmonton, AB, Canada

Abstract

Topic analysis is a powerful tool that extracts “topics” from document collections. Unlike manual tagging, which is effort intensive and requires expertise in the documents’ subject matter, topic analysis (in its simplest form) is an automated process. Relying on the assumption that each document in a collection refers to a small number of topics, it extracts bags of words attributable to these topics. These topics can be used to support document retrieval or to relate documents to each other through their associated ...

Get The Art and Science of Analyzing Software Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.