Home > Structured Storage > Comparing MPI and Map-Reduce

Comparing MPI and Map-Reduce

This paper shows some general guidelines on choosing MPI  or Map-Reduce in your applications:

Chen, W.-Y.; Song, Y.; Bai, H.; Lin, C.-J. & Chang, E. Y.
Parallel Spectral Clustering in Distributed Systems. IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, 568-586

“… In general, MapReduce is suitable for noniterative algorithms where nodes require little data exchange to proceed (noniterative and independent); MPI is appropriate for iterative algorithms where nodes require data exchange to proceed (iterative and dependent).”

My unverified understanding: Map-reduce is more suitable for data-intensive task, while MPI is more appropriate for computation-intensive task. In Mapreduce, the data is less correlated, making it easier to allocate to MAP modules.

Some key differences are listed below:

  • Map-reduce is easier to learn, while MPI is distinctly more complex with lots of functions. MPI can control the parallel process in a finer granularity.
  • Map-Reduce communicates between nodes by disk I/O (on GFS, which is faster than NTFS/EXT3), while MPI performs communication by message passing.
  • Map-reduce provides a fault-tolerant mechanism, that is, when one node fails, map-reduce restarts the same task on another node. All MPI processes will exit if one of them fails.
Categories: Structured Storage Tags:
  1. No comments yet.
  1. 2011-12-20 at 10:49 PM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: