Online Training with Virtual Labs

Replication and Rack Awareness

In the beginning, we talked about Hadoop architecture that it is fault tolerant. So to recover from failing data nodes, it does replication. While doing replication it follows the principles as listed in the picture.

Default number of replication copy is three. The first copy is stored on local node. The other two copies are stored on remote rack. Additional replication copies when it is more than three, is stored randomly on any rack. You can follow the color coded blocks to see that Block A is copied first on rack 1 and other two copies are stored on rack 2. Block B and Block C are stored, by following the same principle. You do have options to choose more than three copies for replication. One more important point to note is that replication in Hadoop is at block level.

Replication and rack awareness

Next we will look at the read and write of Hadoop.