I do not know about you, but when I look at any building or technology, I always wonder, what was going through the designers and architects thought. Why they decided to pick one route over another. With this in mind, in next few posts, I will be exploring Hadoop Architecture.
As we all know Hadoop is a framework to solve for large data processing. It is helping businesses to answer relevant business question by looking into the data they could not process earlier. As developers write framework, first order of business is to write down the guidelines, which helps the team to keep focus and use the guidelines to resolve conflicts as they come during framework development. Below is a list of principles, I put together to advance our conversations. By no means this is all inclusive list. Feel free to add your own. If you look at the list, you can see that they are all related with each other.
If you can see the theme you will find that all of these are coming from two requirements. First is to process large amount of data quickly and second is to use commodity hardware. Since commodity hardware will fail so plan for recovery. To support recovery, you need replication. For fast processing you need to read quickly and do processing near data to avoid moving large amount of data across network. A combination of these principles have made Hadoop one of the most successful framework in recent history.
In our next post, we will look at the Hadoop core components. All comments and questions are welcome.