Skip to content

Data Flow Overview

Eddie Dugan edited this page Sep 26, 2017 · 6 revisions

Namenode commands from Client

  • When the NameNode's main.cc runs, it starts up an RPCServer that can receive and parse messages from RDFS clients.
  • This RPCServer has a handler registered for each available client command. So, the RPC messages recognizable by the server define the client-namenode interface.
  • Each registered handler has a corresponding method in ClientNamenodeProtocolImpl.
    • These methods take the proto-serialized message and parse it's arguments. Here is an example.
    • Messages are received as strings and parsed into specific proto types. The proto type determines the argument format, which ZooKeeper and the Namenode understand because they use the shared proto definitions.
    • The NameNodeImpl calls a corresponding ZooKeeper method (in the Zookeeper NameNode client). Zookeeper has access to the actual data and metadata. It takes the arguments and command type, performs the command, and returns response to the NameNode.
    • The NameNode feeds the result to RPCServer, which responds to the client with another proto-serialized message.

Datanode commands from Client

Datanode <--> Namenode communication

  • Occurs through zookeeper (wiki page).
  • There is a Zookeeper Client running on each type of node, for communication with the zookeeper system.

Protobuf

  • Google’s protobuffer is the serialization method that HDFS uses for RPC calls.
  • RDFS replicates these RPC calls. The format of supported proto messages defines the RPC interface for a client to connect to a Datanode or Namenode.
  • Protos are defined here.

Clone this wiki locally