Why data will be divided as Multiple blocks in hadoop?
- If we store the whole data in one system. Then it takes lot of time to processing the data and are missing the opportunity to use parllel processing system.That is why data will be divided as multiple blocks and those blocks will be stored across the all systems.
- If we stroer the all the data in one system then there is also chance of losing the data. That is if one system is down then we lost all the data in that system.
- In order to avoid this loss, hadoop has reflication factor. The default replication factor in hadoop is 3. That if we have block 'b0' then this block will be replicated 3 times. If we gave replication factor as 3 Then these blocks will be copied in 3 places across the slave nodes.
- Hadoop always maintains the replication of data based on replication factor. we can give the replication factor number manually if we want.
How the data will be processed after storing the data in hdfs?
The data will be processed using MapReduce Program.
This mapreduce program will go to every node and process the data present in that node
Published By : asvitha chunduru