This recipe shows how Spark supports a wide range of input and output sources. Spark makes it very simple to load and save data in a large number of file formats. Formats range from unstructured, such as
text, to semi-structured, such as
JSON, to structured, such as
To step through this recipe, you will need a running Spark cluster either in pseudo distributed mode or in one of the distributed modes, that is, standalone, YARN, or Mesos. Also, the reader is expected to have an understanding of text files, JSON, CSV, SequenceFiles, and object files.
val input = sc.textFile("hdfs://namenodeHostName:8020/repos/spark/README.md") val wholeInput = sc.wholeTextFiles("file://home/padma/salesFiles") ...