Let us explore little further about HDFS. Some of the attributes of HDFS are commodity hardware, fault tolerant and ability to handle large set of data. It should have high throughput and streaming access to file system. Streaming access is nothing but ability to read groups of data block in one read. This helps Hadoop to read data faster. Just for our convenience to read, let us list them.
There are two main components for HDFS. They are name node and data node.
Name node is like master of the HDFS and data nodes are the slaves. So name node is more important in the architecture than data nodes. While designing your system – you should pay attention to safeguard your name nodes. Failing name nodes will create problem for you. On the other hand, failing data nodes can be recovered.
All of these Hadoop components have API and you need to write a program to call these API to interact with these components. The program, what you wrote is the Hadoop client. The other component to learn is secondary name node. You can think secondary name node as helper class to Primary name node. It is not a backup of primary name node. If primary name node fails, it does not help you to recover. It just helps the primary name node to offload some of its task to reduce load. As you can see in the picture below, client is the one which is putting together the work between storage component (HDFS) and compute (Map Reduce).
In the next post, we will talk about hadoop job tracker.