An implementation of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol in Go.
This project has been developed in several phases:
-
Phase 1: Core SWIM Protocol Implementation (Completed)
- Implemented basic node membership (joining and leaving the cluster).
- Implemented the gossip mechanism for disseminating membership updates.
- Implemented failure detection and node removal. (To be implemented soon)
-
Phase 2: Advanced Features (Completed)
- Added support for disseminating custom data (payloads) along with membership information.
- Implemented anti-entropy mechanisms to ensure eventual consistency.
- Added encryption for secure communication between nodes.
-
Phase 3: Testing and Documentation (In Progress)
- Wrote comprehensive unit and integration tests. (Unit tests for core components completed)
- Create detailed documentation, including High-Level Design (HLD), Low-Level Design (LLD), and architecture diagrams.
The go-gossip system aims to provide a robust and scalable membership management solution for distributed services using the SWIM protocol. It operates on a peer-to-peer model where each node in the cluster maintains a local view of the entire cluster's membership. This membership list includes information about other nodes, such as their network address, current state (alive, suspected, or dead), and optional custom payloads.
Key aspects of the HLD:
- Decentralized: No central authority or leader node; all nodes are peers. This enhances fault tolerance and scalability.
- Weakly Consistent: Membership information propagates through a gossip-style mechanism, leading to eventual consistency across the cluster.
- Failure Detection: Nodes periodically "ping" a subset of other nodes to detect failures. Suspected nodes are then confirmed dead if they remain unresponsive.
- Anti-Entropy: Mechanisms are in place to actively exchange membership lists between nodes, ensuring that discrepancies are eventually resolved and all nodes converge on a consistent view of the cluster.
- Secure Communication: All inter-node communication is encrypted to prevent eavesdropping and tampering.
The go-gossip library is structured around the following core components:
-
Node: (pkg/gossip/node.go)
- Represents a single member within the gossip cluster.
- Contains its network
Addr(e.g., IP:Port), its currentState(Alive, Suspected, Dead),LastUpdatedtimestamp for consistency checks, and a genericPayloadfield for custom application-specific data. - Implements
json.Marshalerandjson.Unmarshalerfornet.Addrserialization.
-
MembershipList: (pkg/gossip/membership.go)
- A thread-safe data structure (
sync.RWMutex) that stores theNodeobjects for all known members of the cluster. - Provides methods for
Adding new nodes,Getting a node by address,Allfor retrieving all known nodes, and crucially,Mergefor incorporating membership updates from other nodes. - The
addOrUpdateinternal method handles the core merging logic, prioritizing newer information based onLastUpdatedtimestamps and preserving non-empty payloads.
- A thread-safe data structure (
-
Gossiper: (pkg/gossip/gossiper.go)
- The central orchestrator of the SWIM protocol.
- Manages the
MembershipListfor its local view of the cluster. - Utilizes a
Transportinterface for all network communication. - Runs two primary goroutines:
pingLoop: Periodically sends lightweight "Ping" messages to random nodes to actively detect failures.syncLoop: Periodically sends comprehensive "Sync" messages (containing its entireMembershipList) to random nodes to resolve inconsistencies (anti-entropy).
- Includes
SetPayloadto dynamically update the local node's custom data andMembersto expose the current cluster view. - Uses a
sync.WaitGroupfor graceful shutdown of its internal goroutines.
-
Transport Interface: (pkg/gossip/transport.go)
- Defines the contract for network communication, abstracting away the underlying transport mechanism.
- Specifies
Write(send data to address),Read(receive data channel), andStop(terminate transport) methods.
-
UDPTransport: (pkg/gossip/udp_transport.go)
- A concrete implementation of the
Transportinterface using UDP datagrams. - Handles UDP socket creation, listening for incoming messages, and sending outgoing messages.
- Includes a
readLoopgoroutine that continuously reads from the UDP socket and dispatches messages to a channel, ensuring thread-safe buffer handling.
- A concrete implementation of the
-
SecureTransport: (pkg/gossip/secure_transport.go)
- A decorator (wrapper) for any
Transportimplementation, providing authenticated encryption. - Uses AES-GCM for secure communication, ensuring confidentiality and integrity.
- Encrypts data before
Writeing it to the underlying transport and decrypts data received from the underlying transport in itsReadmethod. - Requires a symmetric
keyfor encryption/decryption.
- A decorator (wrapper) for any
-
Message: (pkg/gossip/message.go)
- Defines the structure for inter-node communication, including
MessageType(Ping, Sync) and aPayload(the marshaled membership list forSyncmessages, or empty forPing). - Provides
EncodeandDecodemethods for JSON serialization/deserialization.
- Defines the structure for inter-node communication, including
The go-gossip architecture is entirely peer-to-peer. Each node running the Gossiper service is an independent entity that communicates directly with other nodes in the cluster.
+----------------+ +----------------+ +----------------+
| Node A | | Node B | | Node C |
| (Gossiper A) | | (Gossiper B) | | (Gossiper C) |
+-------+--------+ +-------+--------+ +-------+--------+
| | |
| [SecureTransport] | [SecureTransport] | [SecureTransport]
| | |
| (Encrypted UDP) | (Encrypted UDP) | (Encrypted UDP)
| | |
+-------v--------+ +-------v--------+ +-------v--------+
| Membership |<------>| Membership |<------>| Membership |
| List A | | List B | | List C |
+-------+--------+ +-------+--------+ +-------+--------+
^ ^ ^
| | |
| (Ping/Sync Messages via Gossip) |
+-------------------------------------------------+
Key Architectural Principles:
- Symmetry: All nodes are identical in functionality; there are no special roles like leaders or primary nodes.
- Scalability: The gossip protocol's probabilistic nature allows it to scale effectively to large clusters without centralized bottlenecks.
- Resilience: The decentralized nature means the system can tolerate the failure of multiple nodes without impacting the overall cluster membership management.
- Extensibility: The use of a generic
Payloadin theNodestructure andMessageallows users to disseminate arbitrary application-specific data across the cluster. TheTransportinterface also provides flexibility to swap out underlying communication mechanisms (e.g., TCP, custom protocols) if needed. - Security: The
SecureTransportlayer ensures that all communication within the gossip network is confidential and tamper-proof.
Message Flow:
- Initialization: A new
Gossiperis created with its local address, a list of initial peer addresses, and a configuredTransport(e.g.,SecureTransportwrappingUDPTransport). - Self-Reporting: The
Gossiperimmediately adds itself to itsMembershipList. - Peer Seeding: The initial peer addresses are added to the
MembershipListwithAlivestatus and current timestamps. - Gossip Loops:
- Ping Loop (Failure Detection): Periodically, a node randomly selects another node from its
MembershipList(excluding itself) and sends a lightweight "Ping" message. The lack of a response within a timeout period would lead to a suspicion of failure (failure detection not yet fully implemented, but framework is there). - Sync Loop (Anti-Entropy): Periodically, a node randomly selects another node from its
MembershipList(excluding itself) and sends a "Sync" message. This message contains the sender's entire currentMembershipList.
- Ping Loop (Failure Detection): Periodically, a node randomly selects another node from its
- Message Handling:
- When a
Gossiperreceives a message, it decodes it. - If it's a "Ping" message, it currently does nothing (could be extended to send an "Ack" or trigger a "Sync Request").
- If it's a "Sync" message, it deserializes the incoming
MembershipListandMerges it with its localMembershipList.
- When a
- Merging Logic: The
MembershipList.Mergemethod iterates through the incoming nodes. For each node, it compares itsLastUpdatedtimestamp with the locally stored version. If the incoming node is newer, the local entry is updated (state, timestamp). If the incoming node also has a non-empty payload, the local node's payload is updated; otherwise, the local payload is preserved. - Payload Dissemination: Custom data (payloads) attached to nodes are propagated through the
Syncmessages and updated inMembershipListduring the merging process, ensuring all nodes eventually reflect the latest state of each member's payload.
This robust system ensures that all healthy nodes in the cluster eventually converge on a consistent and up-to-date view of the cluster membership, including any custom metadata.