-
Notifications
You must be signed in to change notification settings - Fork 2
Inter DataNode Block Transfer
DataNodes transfer blocks directly between one another using DataTransfer and BlockSender objects. DataNodes use a TCP connection to other DataNodes via the Hadoop.net.NetUtils functionality. Through this connection, DataNodes stream block data between each other.
DataNodes have methods (including transferBlock) which directly access DataTransfer objects to initiate and perform block transfer. DataTransfer is a Runnable, by virtue of implementing the interface. The DataTransfer is used to transfer an ExtendedBlock of data.
Within the run method, DataTransfer performs a series of steps to set up the connection and initiate the transfer. First, the atomic count of asynchronous transmits in progress is incremented for the DataNode. Then, the DataTransfer obtains the address to the target and initializes a socket. An access token is obtained for the ExtendedBlock in question and a write timeout is set.
Then the DataTransfer obtains an OutputStream and InputStream from Hadoop.net.NetUtils for the socket. A DataOutputStream (Buffered) and DataInputStream is set up for the connection. A Sender object is used to set up the output stream, which calls writeBlock with the ExtendedBlock, the storage type, the token, the client name, the target DataNodes, the target storage types, the DataNodeInfo (for this DataNode), the BlockSender checksum, and the caching strategy.
Then, the BlockSender object is used to actually send the data over the output stream. At the end of this process, the DataTransfer handles checksum errors and other transfer errors. The asynchronous transmits in progress count is atomically decreased and the streams and socket are closed down.
In the event of an InvalidChecksumException being thrown, the BlockScanner object has a method called markSuspectBlock which is called. The BlockScanner then does some minimal processing before forwarding the invocation to the equivalently named method on the VolumeScanner for the storage.
The VolumeScanner then adds the Block to its list of suspectedBlocks and recentSuspectedBlocks. In the VolumeScanner, in the continuously running loop, a suspected block is popped of the list. The scanner prioritizes marked blocks in this way (otherwise picking another block). Here, the scanBlock method is called and the block is obtained. If there are no errors, a BlockSender is created, which is used to send blocks to the nullStream.
At the end of that, the resultHandler is called with the block with either an error code or null. This object is a ScanResultHandler object. Therein, if there is no exception code, the ScanResultHandler logs the success and exits. If there is, it is processed and logged. If the block was not found in the dataset, no error is thrown. If the file was not found, this is most likely due to a race condition in the write. If this was truly an error, the directory scanning will handle it. In other cases, the bad blocks are reported using DataNode's reportBadBlocks method. This essentially reports the bad blocks the NameNode using the BPOfferService.
During a general IOException, the method checkDiskErrorAsync is called, which leads to starting the checkDiskErrorThread. This eventually calls the checkDiskError method (in a continuous loop). This tries to remove the unhealthy volumes from the DataNode.
At that point, the handleDiskError method is called. This essentially sends the error report to the NameNodes. If the DataNode still has enough volumes, the DataNode just schedules a block report and returns. Otherwise, this symbolizes a fatal error.
- Rice HDFS
- General Notes
- Common
-
NameNode
- Glossary
- Specification
- Documentation
- Specification
- DataNode
- Teams and Structure
- Overview
- Documentation
- Interfacing with NameNode
- Interfacing with Client
- Interfacing with other DataNodes