Batch processing

The following points describe the batch processing system:

  • Very efficient in processing a high volume of data.
  • All data processing steps (that is, data collection, data ingestion, data processing, and results presentation) are done as one single batch job.
  • Throughput carries more importance than latency. Latency is always more than a single minute.
  • Throughput directly depends on the size of the data and available computational system resources.
  • Available tools include Apache Sqoop, MapReduce jobs, Spark jobs, Hadoop DistCp utility, and so on.

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.