One of the most valuable services we provide is teaching our customers how to create gold-standard data, also known as training data. Nearly every successful-driven NLP project we have done has involved a good deal of customer-driven annotation. The quality of the NLP is entirely dependent on the quality of the training data. Creating training data is a fairly straightforward process, but it requires attention to detail and significant resources. From a budget perspective, you can expect to spend as much as the development team on annotation, if not more.
We will use sentiment over tweets as our example, and we will assume a business context, but even academic efforts will have similar dimensions.