but no one tool can wrangle arbitrary data.
- Chapter 4. Data wrangling: from capture to domestication
- from Think Like a Data Scientist: Tackle the data science process step-by-step
- Publisher: Manning Publications
- Released: March 2017
However search engines and associative Apache Foundation projects such as Nutch and Tika are purpose built to ingest thousands of formats for search consumption. The interesting engineering test would be to use Solr/Tika/Nutch/Akka land data in json or an a format that can be consumed by data science tools.
Share this highlighthttp://learning.oreilly.com/a/think-like-a/16928133/