Chapter 12. Afterword: The Future of Annotation

In this book we have endeavored to give you a taste of what it’s like to go through the entire process of doing annotation for training machine learning (ML) algorithms. The MATTER development cycle provides a tested and well-understood methodology for all the steps required in this endeavor, but it doesn’t tell you everything there is to know about annotation. In this chapter we look toward the future of annotation projects and ML algorithms, and show you some ways that the field of Natural Language Processing (NLP) is changing, as well as how those changes can help (or hurt) your own annotation and ML projects.

Crowdsourcing Annotation

As you have learned from working your way through the MATTER cycle, annotation is an expensive and time-consuming task. Therefore, you want to maximize the utility of your corpus to make the most of the time and energy you put into your task.

One way that people have tried to ameliorate the cost of large annotation projects is to use crowdsourcing—by making the task available to a large group of (usually untrained) people, it becomes both cheaper and faster to obtain annotated data, because the annotation is no longer being done by a handful of selected annotators, but rather by large groups of people.

If the concept of crowdsourcing seems strange, think about asking your friends on Facebook to recommend a restaurant, or consider what happens when a famous person uses Twitter to ask her followers for ...

Get Natural Language Annotation for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.