-
Notifications
You must be signed in to change notification settings - Fork 2
NameNode Overview
The NameNode is the central hub for a HDFS cluster. It keeps track of the directory tree structure, chunk locations, and a mapping of files to their chunks. It does not store the actual chunks. There are usually one or two NameNodes and many DataNodes in a cluster.
Client applications talk to the NameNode whenever they want to read, write, copy, delete, or move a file. The NameNode responds with a list of DataNodes, and then the client application talks directly to the DataNodes. This reduces the amount of traffic that flows through the NameNode, but it also makes the NameNode a single-point-of-failure in HDFS.
The NameNode has several features that protect against failures in the HDFS cluster.
-
DataNodes send Heartbeats to the NameNode, indicating that the DataNode is still alive. If a DataNode dies, then the NameNode will replicate the dead DataNode's chunks across the rest of the cluster and will stop referring to the dead DataNode.
-
The NameNode maintains a list of all the metadata changes in the operation log. It will periodically flush the metadata changes described in the operation log to fsImage, which is a file that contains all the metadata. So in case the system crashes, the NameNode can reload fsImage and replay the metadata changes listed in the operation log to return to the NameNode state before crashing.
-
The NameNode can have a backup NameNode. This backup provides checkpointing and also maintains an up-to-date image of the namespace.
This wiki gives a detailed description (including use cases, data structures, and related components) of the major components in the NameNode. There is also a glossary for common terms.
- Rice HDFS
- General Notes
- Common
-
NameNode
- Glossary
- Specification
- Documentation
- Specification
- DataNode
- Teams and Structure
- Overview
- Documentation
- Interfacing with NameNode
- Interfacing with Client
- Interfacing with other DataNodes