Skip to content

DataNode Overview

Lauren Khoo edited this page Sep 6, 2016 · 2 revisions

DataNode Overview

HDFS allows users to store data in files in a filesystem namespace. Each file on HDFS is divided into one or more blocks. These blocks contain the actual data for the file and are stored on and managed by DataNodes. HDFS employs a master/slave architecture where the NameNode acts as the master that manages multiple DataNodes. At the direction of the NameNode, DataNodes create, delete, and replicate blocks. DataNodes fulfill client requests to read and/or write files, so the actual file data never flows through the NameNode.

To perform their functions, DataNodes must constantly remain in contact with several entities: the NameNode, the clients, and other DataNodes. In order to perform block creation, deletion, and replication DataNodes must respond to instructions from the NameNode. In order to fulfill read/write requests, DataNodes must interface with file system clients. Lastly, DataNodes must communicate with each other to coordinate replication.

The following sections of the wiki give detailed descriptions (including use cases, data structures, and related components) of the the DataNodes' interaction with the NameNode, clients, and other DataNodes.

Clone this wiki locally