Whether you’re a programmer with little to no knowledge of Python, or an experienced data scientist or engineer, this Learning Path will walk you through natural language processing, using both Python and Scala, and show you how to implement a range of popular tools including Spark, scikit-learn, SpaCy, NLTK, and gensim for text mining.
You’ll learn the most common techniques for processing text, how to use machine learning to generate annotators and apply them within a data pipeline, and the differences between NLP pipelines and other approaches to semantic text mining. You’ll learn about standard UIMA annotators, custom annotators, and machine-learned annotators, and understand how architectures for text processing pipelines can incorporate some of the most popular big data tools such as Kafka, Spark, SparkSQL, Cassandra, and ElasticSearch.
By the end of the learning path, you will be able to build a natural language processing and entity extraction pipeline, and will have a complete understanding of the capabilities and limitations of natural language text processing.
Materials or downloads needed in advance: Example files