PySpark

Let's go back to the same discussion we had of building a machine learning/NLP model on Hadoop and the other where we score a ML model on Hadoop. We discussed second option of scoring in depth in the last section. Instead sampling a smaller data-set and scoring let’s use a larger data-set and build a large-scale machine learning model step-by-step using PySpark. I am again using the same running data with the same schema:

ID

Comment

Class

UA0001

I tried calling you, The service was not up to the mark

1

UA0002

Can you please update my phone no

0

UA0003

Really bad experience

1

UA0004

I am looking for an iPhone

0

UA0005

Can somebody help me with my password

1

UA0006

Thanks for considering my request for

0

Consider the ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.