Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 120 additions & 67 deletions pages/clustering/replication.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,32 @@ import { Callout } from 'nextra/components'
<Callout>

Instances need to remember their role and configuration details in a replication
cluster upon restart, and the `--replication-restore-state-on-startup` needs to be
set to `true` when first initializing the instances and remain `true` throughout
the instances' lifetime for replication to work correctly. If the flag is set to `false`,
MAIN can't communicate with instance, because each REPLICA has a UUID of MAIN which can communicate
with it, and it is set up only on instance registration. In case the flag is set to `false`,
the way to go forward is first to unregister the instance on MAIN and register it again.
cluster upon restart, and the `--replication-restore-state-on-startup` needs to
be set to `true` when first initializing the instances and remain `true`
throughout the instances' lifetime for replication to work correctly. If the
flag is set to `false`, MAIN can't communicate with instance, because each
REPLICA has a UUID of MAIN which can communicate with it, and it is set up only
on instance registration. In case the flag is set to `false`, the way to go
forward is first to unregister the instance on MAIN and register it again.

When reinstating a cluster, it is advised first to initialize the MAIN
instance, then the REPLICA instances.
When reinstating a cluster, it is advised first to initialize the MAIN instance,
then the REPLICA instances.

Data replication currently **works only in the in-memory transactional [storage
mode](/fundamentals/storage-memory-usage)**.

</Callout>

When distributing data across several instances, Memgraph uses replication to
provide a satisfying ratio of the following properties, known from the CAP theorem:
provide a satisfying ratio of the following properties, known from the CAP
theorem:

1. **Consistency** (C) - every node has the same view of data at a given point in
time
2. **Availability** (A) - all clients can find a replica of the data, even in the
case of a partial node failure
3. **Partition tolerance** (P) - the system continues to work as expected despite a
partial network failure
1. **Consistency** (C) - every node has the same view of data at a given point
in time
2. **Availability** (A) - all clients can find a replica of the data, even in
the case of a partial node failure
3. **Partition tolerance** (P) - the system continues to work as expected
despite a partial network failure

In the replication process, the data is replicated from one storage (MAIN
instance) to another (REPLICA instances).
Expand Down Expand Up @@ -71,43 +73,44 @@ cluster.

Once demoted to REPLICA instances, they will no longer accept write queries. In
order to start the replication, each REPLICA instance needs to be registered
from the MAIN instance by setting a replication mode (SYNC, ASYNC or STRICT_SYNC) and
specifying the REPLICA instance's socket address.
from the MAIN instance by setting a replication mode (SYNC, ASYNC or
STRICT_SYNC) and specifying the REPLICA instance's socket address.

The replication mode defines the terms by which the MAIN instance can commit the
changes to the database, thus modifying the system to prioritize either
consistency or availability:


- **STRICT_SYNC** - After committing a transaction, the MAIN instance will communicate
the changes to all REPLICA instances and wait until it
receives a response or information that a timeout is reached. The STRICT_SYNC mode ensures
- **STRICT_SYNC** - After committing a transaction, the MAIN instance will
communicate the changes to all REPLICA instances and wait until it receives a
response or information that a timeout is reached. The STRICT_SYNC mode ensures
consistency and partition tolerance (CP), but not availability for writes. If
the primary database has multiple replicas, the system is highly available for
reads. But, when a replica fails, the MAIN instance can't process the write due
to the nature of synchronous replication. It is implemented as two-phase commit protocol.
to the nature of synchronous replication. It is implemented as two-phase commit
protocol.


- **SYNC** - After committing a transaction, the MAIN instance will communicate
the changes to all REPLICA instances and wait until it
receives a response or information that a timeout is reached. It is different from
**STRICT_SYNC** mode because it the MAIN can continue committing even in situations
when **SYNC** replica is down.
the changes to all REPLICA instances and wait until it receives a response or
information that a timeout is reached. It is different from **STRICT_SYNC** mode
because it the MAIN can continue committing even in situations when **SYNC**
replica is down.


- **ASYNC** - The MAIN instance will commit a transaction without receiving
confirmation from REPLICA instances that they have received the same
transaction. ASYNC mode ensures system availability and partition tolerance (AP),
while data can only be eventually consistent.
transaction. ASYNC mode ensures system availability and partition tolerance
(AP), while data can only be eventually consistent.


<Callout type="info">

Users are advised to use the same value for configuration
flag `--storage-wal-file-flush-every-n-txn` on MAIN and SYNC REPLICAs. Otherwise,
the situation could occur in which there is a data which is fsynced on REPLICA and not on MAIN.
In the case MAIN crashes, this could leave to conflicts in system that would need to be
manually resolved by users.
Users are advised to use the same value for configuration flag
`--storage-wal-file-flush-every-n-txn` on MAIN and SYNC REPLICAs. Otherwise, the
situation could occur in which there is a data which is fsynced on REPLICA and
not on MAIN. In the case MAIN crashes, this could leave to conflicts in system
that would need to be manually resolved by users.

</Callout>

Expand All @@ -127,32 +130,78 @@ If the REPLICA is so far behind the MAIN instance that the synchronization using
WAL files and deltas within it is impossible, Memgraph will use snapshots to
synchronize the REPLICA to the state of the MAIN instance.

From Memgraph version 2.15, a REPLICA instance has integrated support to only listen to one MAIN.
This part is introduced to support the high availability but also reflects on the replication.
The mechanism that is used is a unique identifier that which MAIN instance sends to all REPLICAs
when REPLICA is first registered on a MAIN. A REPLICA stores the UUID of the MAIN instance it listens to.
The MAIN's UUID is also stored on a disk, in case of restart of an instance to continue listening to the correct
MAIN instance. When REPLICA restarts, `--replication-restore-state-on-startup` must be set to
`true` to continue getting updates from the MAIN.
From Memgraph version 2.15, a REPLICA instance has integrated support to only
listen to one MAIN. This part is introduced to support the high availability but
also reflects on the replication. The mechanism that is used is a unique
identifier that which MAIN instance sends to all REPLICAs when REPLICA is first
registered on a MAIN. A REPLICA stores the UUID of the MAIN instance it listens
to. The MAIN's UUID is also stored on a disk, in case of restart of an instance
to continue listening to the correct MAIN instance. When REPLICA restarts,
`--replication-restore-state-on-startup` must be set to `true` to continue
getting updates from the MAIN.

## Auth data replication (Enterprise)

If you are using a Memgraph Enterprise license, all authentication/authorization data, including users, roles,
and associated permissions, will be replicated.
If you are using a Memgraph Enterprise license, all authentication/authorization
data, including users, roles, and associated permissions, will be replicated.

## Auth modules replication (Enterprise)

Authentication modules are not replicated and must be configured manually by the administrator.
Authentication modules are not replicated and must be configured manually by the
administrator.

## Multi-tenant data replication (Enterprise)

When you are using a Memgraph Enterprise license, multi-tenant commands are replicated as any other data command.
Database manipulation is allowed only on MAIN. However, REPLICAs have the
ability to use databases and read data contained in them.
When you are using a Memgraph Enterprise license, multi-tenant commands are
replicated as any other data command. Database manipulation is allowed only on
MAIN. However, REPLICAs have the ability to use databases and read data
contained in them.

When dropping a database used on a REPLICA, the REPLICA will receive the command
and will partially drop the database. It will hide the database and prevent any new
usage. Once all clients have released the database, it will be deleted entirely.
and will partially drop the database. It will hide the database and prevent any
new usage. Once all clients have released the database, it will be deleted
entirely.

<Callout type="info">

As of Memgraph v3.5 replication queries (such as `REGISTER REPLICA`, `SHOW
REPLICAS`, `DROP REPLICA`, etc.) target the default "memgraph" database and
require access to it. The recommendation is to use the default "memgraph"
database as an admin/system database and store graphs under other databases.

</Callout>

### Requirements for replication queries

To execute replication queries, users must have:
1. The `REPLICATION` privilege
2. **AND** access to the default "memgraph" database

### Impact on multi-tenant environments

In multi-tenant environments where users might not have access to the "memgraph"
database, replication management operations will fail. This reinforces the
recommendation to treat the "memgraph" database as an administrative/system
database.

{<h4 className="custom-header">Example: Admin user with replication privileges</h4>}

```cypher
-- Create admin role with replication privileges
CREATE ROLE replication_admin;
GRANT REPLICATION TO replication_admin;
GRANT DATABASE memgraph TO replication_admin;

-- Create user with replication admin role
CREATE USER repl_admin IDENTIFIED BY 'admin_password';
SET ROLE FOR repl_admin TO replication_admin;
```

In this setup, `repl_admin` can:
- Execute all replication queries (`REGISTER REPLICA`, `SHOW REPLICAS`, etc.)
- Access the "memgraph" database for administrative operations
- Manage the replication cluster configuration


## Running multiple instances

Expand All @@ -171,9 +220,10 @@ cluster](#set-up-a-replication-cluster).
Each Memgraph instance has the role of the MAIN instance when it is first
started.

Also, by default, each crashed instance restarts with its previous role (MAIN as MAIN, REPLICA as REPLICA).
To change this behavior, set the `--replication-restore-state-on-startup` to `false` when
first initializing the instance. In this way, all instances will get restarted as MAIN.
Also, by default, each crashed instance restarts with its previous role (MAIN as
MAIN, REPLICA as REPLICA). To change this behavior, set the
`--replication-restore-state-on-startup` to `false` when first initializing the
instance. In this way, all instances will get restarted as MAIN.

### Assigning the REPLICA role

Expand All @@ -186,8 +236,8 @@ SET REPLICATION ROLE TO REPLICA WITH PORT <port_number>;
```

If you set the port of each REPLICA instance to `10000`, it will be easier to
register replicas later on because the query for registering replicas uses a port
10000 as the default one.
register replicas later on because the query for registering replicas uses a
port 10000 as the default one.

Otherwise, you can use any unassigned port between 1000 and 10000.

Expand All @@ -210,8 +260,8 @@ retrieve its original function. You need to [drop
it](#dropping-a-replica-instance) from the MAIN and register it again.

If the crashed MAIN instance goes back online once a new MAIN is already
assigned, it cannot reclaim its previous role. It can be cleaned and
demoted to become a REPLICA instance of the new MAIN instance.
assigned, it cannot reclaim its previous role. It can be cleaned and demoted to
become a REPLICA instance of the new MAIN instance.

### Checking the assigned role

Expand All @@ -225,12 +275,12 @@ SHOW REPLICATION ROLE;

Once all the nodes in the cluster are assigned with appropriate roles, you can
enable replication in the MAIN instance by registering REPLICA instances,
setting a replication mode (SYNC and ASYNC), and specifying
the REPLICA instance's socket address. Memgraph doesn't support chaining REPLICA
instances, that is, a REPLICA instance cannot be replicated on another REPLICA
instance.
setting a replication mode (SYNC and ASYNC), and specifying the REPLICA
instance's socket address. Memgraph doesn't support chaining REPLICA instances,
that is, a REPLICA instance cannot be replicated on another REPLICA instance.

If you want to register a REPLICA instance with a SYNC replication mode, run the following query:
If you want to register a REPLICA instance with a SYNC replication mode, run the
following query:

```plaintext
REGISTER REPLICA name SYNC TO <socket_address>;
Expand All @@ -244,8 +294,8 @@ REGISTER REPLICA name ASYNC TO <socket_address>;
```


If you want to register a REPLICA instance with an STRICT_SYNC replication mode, run
the following query:
If you want to register a REPLICA instance with an STRICT_SYNC replication mode,
run the following query:

```plaintext
REGISTER REPLICA name STRICT_SYNC TO <socket_address>;
Expand Down Expand Up @@ -278,7 +328,8 @@ Example of a `<socket_address>` using only `IP_ADDRESS`:
"172.17.0.5"
```

Also, you can register REPLICA instances using DNS names. In that case, the socket address must be a string value as follows:
Also, you can register REPLICA instances using DNS names. In that case, the
socket address must be a string value as follows:

```plaintext
"DOMAIN_NAME:PORT_NUMBER"
Expand All @@ -291,15 +342,17 @@ number, for example:
"memgraph-replica.memgraph.net:10050"
```

If you set REPLICA roles using port `10000`, you can define the socket address specifying only the valid domain name, for example:
If you set REPLICA roles using port `10000`, you can define the socket address
specifying only the valid domain name, for example:

```plaintext
"memgraph-replica.memgraph.net"
```

When a REPLICA instance is registered, it will start replication in ASYNC mode
until it synchronizes to the current state of the database. Upon
synchronization, REPLICA instances will either continue working in the ASYNC, STRICT_SYNC or SYNC mode.
synchronization, REPLICA instances will either continue working in the ASYNC,
STRICT_SYNC or SYNC mode.

### Listing all registered REPLICA instances

Expand Down Expand Up @@ -328,9 +381,9 @@ with the MAIN instance#synchronizing-instances.
The missing data changes can be sent as snapshots or WAL files. Snapshot files
represent an image of the current state of the database and are much larger than
the WAL files, which only contain the changes, deltas. Because of the difference
in file size, Memgraph favors the WAL files. It is important to note that replicas
receive only changes which are made durable on the MAIN instance, in other words changes which
are already fsynced.
in file size, Memgraph favors the WAL files. It is important to note that
replicas receive only changes which are made durable on the MAIN instance, in
other words changes which are already fsynced.

While the REPLICA instance is in the RECOVERY state, the MAIN instance
calculates the optimal synchronization path based on the REPLICA instance's
Expand Down
Loading