PySpark
Let's go back to the same discussion we had of building a machine learning/NLP model on Hadoop and the other where we score a ML model on Hadoop. We discussed second option of scoring in depth in the last section. Instead sampling a smaller data-set and scoring let’s use a larger data-set and build a large-scale machine learning model step-by-step using PySpark. I am again using the same running data with the same schema:
ID |
Comment |
Class |
---|---|---|
UA0001 |
I tried calling you, The service was not up to the mark |
1 |
UA0002 |
Can you please update my phone no |
0 |
UA0003 |
Really bad experience |
1 |
UA0004 |
I am looking for an iPhone |
0 |
UA0005 |
Can somebody help me with my password |
1 |
UA0006 |
Thanks for considering my request for |
0 |
Consider the ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.