Online Training with Virtual Labs

Hadoop Job Tracker

Here we will talk about the steps and sequence of Hadoop processing. Every step is numbered so you can follow the sequence easily. By following the sequence, you can also understand the relationship among these components.

As first step user copies the input files to DFS and then submits the job to the client. Now to start the process, client gets the input files information like names and locations. At this moment client also splits any big job to multiple smaller jobs. After job splits, job information is uploaded to DFS. Now client submits the job to job tracker. Job tracker initializes the job and sends it to job queue. Next job tracker reads the job files from DFS and starts creating maps and reduces. Each data node sends the heartbeat to name node to let it know that I am alive and you can send some work to me. As we have talked before data processing is local in Hadoop, so after listening, the heartbeat, job tracker picks up the job from job queue and then assigns it to task tracker to work on the task. I know this is whole lot of sequences to remember. I will encourage you to take few moments to look at the picture and follow the sequence.  This will help to internalize the concept.

Job Tracker