Machine Learning

What are the differences between batch processing and stream processing systems?

Answer by Ramaninder Singh:

Sean Owen's answer is quite accurate and to the point. I would still like to answer this question from the point of view of a beginner.
 
As pointed out, example is kind of wrong. Hadoop is a complete ecosystem (check http://hadoop.apache.org/#What+I…) and MapReduce is the Batch Processing System of Hadoop ecosystem. And Spark is also batch processing system originally but one of its library Spark Streaming | Apache Spark  is designed for stream processing.

Batch processing is very efficient in processing high volume data. Where data is collected, entered to the system, processed and then results are produced in batches. Here time taken for the processing is not an issue. Batch jobs are configured to run without manual intervention, trained against entire dataset at scale in order to produce output in the form of computational analyses and data files. Depending on the size of the data being processed and the computational power of the system, output can be delayed significantly.

In contrast, stream processing involves continual input and outcome of data. It emphasizes on the Velocity of the data. Data must be processed within small time period or near real time (keep in mind real time system and stream processing systems are different concepts which are sometimes used interchangeably, for details check this question What's the difference between real-time processing and stream processing?).  Streaming processing gives decision makers the ability to adjust to contingencies based on events and trends developing in real-time.

As there are 3Vs of Big Data (Volume, Velocity, Variety). How I understand this is if your only concern is Volume of the data, then Batch processing is a way to go and if you have to take into consideration Velocity of data also (continuous data) and you need outcome within specific time limits(like in seconds or so) then Stream Processing Engines are there to help you.

What are the differences between batch processing and stream processing systems?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s