@@ -924,6 +924,165 @@ This is an example of an error. In reality, there can be any other error
924
924
that leads to the crash of the Tarantool instance. Fix the bug in the
925
925
application and update the application to the new version.
926
926
927
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
928
+ Recreating replicas
929
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
930
+
931
+ You may need to recreate the replicas: delete existing replicas,
932
+ create new ones and join them back to the replicaset.
933
+ Recreating replicas may be necessary when, for example, replication breaks down.
934
+
935
+ Let's see how to do this. For example, you have a ``storage `` role:
936
+
937
+ .. code-block :: yaml
938
+
939
+ RoleConfig :
940
+ ...
941
+
942
+ - RoleName : storage
943
+ ReplicaCount : 3
944
+ ReplicaSetCount : 2
945
+ DiskSize : 1Gi
946
+ CPUallocation : 0.1
947
+ MemtxMemoryMB : 512
948
+ RolesToAssign :
949
+ - vshard-storage
950
+
951
+ Based on this description, after installation you will have the following pods:
952
+
953
+ .. code-block :: console
954
+
955
+ $ kubectl -n tarantool get pods
956
+ NAME READY STATUS RESTARTS AGE
957
+ ---
958
+ ...
959
+ storage-0-0 1/1 Running 0 2m42s
960
+ storage-0-1 1/1 Running 0 106s
961
+ storage-0-2 1/1 Running 0 80s
962
+ storage-1-0 1/1 Running 0 2m42s
963
+ storage-1-1 1/1 Running 0 111s
964
+ storage-1-2 1/1 Running 0 83s
965
+ tarantool-operator-7879d99ccb-6vrmg 1/1 Running 0 13m
966
+
967
+ Let's try to reduce the number of replicas in the storage replicaset. To do
968
+ so, change the ``ReplicaCount `` number for the ``storage `` role from ``3 `` to ``2 ``
969
+ and run ``upgrade ``:
970
+
971
+ .. code-block :: console
972
+
973
+ $ helm upgrade -f values.yaml test-app tarantool/cartridge --namespace tarantool --version 0.0.8
974
+ ---
975
+ Release "test-app" has been upgraded. Happy Helming!
976
+ NAME: test-app
977
+ LAST DEPLOYED: Tue Mar 2 11:45:29 2021
978
+ NAMESPACE: tarantool
979
+ STATUS: deployed
980
+ REVISION: 2
981
+
982
+ You will see that ``storage-0-2 `` and ``storage-1-2 `` become "Terminating"
983
+ and then disappear from the pods list:
984
+
985
+ .. code-block :: console
986
+
987
+ $ kubectl -n tarantool get pods
988
+ ---
989
+ NAME READY STATUS RESTARTS AGE
990
+ ...
991
+ storage-0-0 1/1 Running 0 12m
992
+ storage-0-1 1/1 Running 0 11m
993
+ storage-0-2 0/1 Terminating 0 11m
994
+ storage-1-0 1/1 Running 0 12m
995
+ storage-1-1 1/1 Running 0 11m
996
+ storage-1-2 0/1 Terminating 0 11m
997
+ tarantool-operator-xxx-yyy 1/1 Running 0 17m
998
+
999
+ Let's check what the cluster looks like on the web UI:
1000
+
1001
+ .. code-block :: console
1002
+
1003
+ $ kubectl -n tarantool port-forward storage-0-0 8081:8081
1004
+ ---
1005
+ Forwarding from 127.0.0.1:8081 -> 8081
1006
+ Forwarding from [::1]:8081 -> 8081
1007
+
1008
+ .. image :: images/kubernetes-recreating-replicas-5px.png
1009
+ :align: left
1010
+ :scale: 70%
1011
+ :alt: Replicas storage-0-2 and storage-1-2 have a note "Server status is 'dead'" next to them.
1012
+
1013
+ Here we have turned off every third replica of the ``storage `` role.
1014
+ Note that we did not expel these replicas from the cluster. If we want to
1015
+ return them and not lose data, return the required number of replicas
1016
+ of the storage role and run ``upgrade `` again.
1017
+
1018
+ However, if you need to delete some replicas' data, you can delete
1019
+ the corresponding :abbr: `PVC ( persistent volume claim ) ` before upgrading.
1020
+
1021
+ .. code-block :: console
1022
+
1023
+ $ kubectl -n tarantool get pvc
1024
+ ---
1025
+ NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
1026
+ ...
1027
+ www-storage-0-0 Bound pvc-729c4827-e10e-4ede-b546-c72642935441 1Gi RWO standard 157m
1028
+ www-storage-0-1 Bound pvc-6b2cfed2-171f-4b56-b290-3013b8472039 1Gi RWO standard 156m
1029
+ www-storage-0-2 Bound pvc-147b0505-5380-4419-8d86-97db6a74775c 1Gi RWO standard 156m
1030
+ www-storage-1-0 Bound pvc-788ad781-343b-43fe-867d-44432b1eabee 1Gi RWO standard 157m
1031
+ www-storage-1-1 Bound pvc-4c8b334e-cf49-411b-8c4f-1c97e9baa93e 1Gi RWO standard 156m
1032
+ www-storage-1-2 Bound pvc-c67d32c0-7d7b-4803-908e-065150f31189 1Gi RWO standard 156m
1033
+
1034
+ It can be seen that the PVC pods that we deleted still exist. Let's remove data of the ``storage-1-2 ``:
1035
+
1036
+ .. code-block :: console
1037
+
1038
+ $ kubectl -n tarantool delete pvc www-storage-1-2
1039
+ ---
1040
+ persistentvolumeclaim "www-storage-1-2" deleted
1041
+
1042
+ Now you need to return the value ``3 `` in the ``ReplicaCount `` field of the storage role and run ``upgrade ``:
1043
+
1044
+ .. code-block :: console
1045
+
1046
+ $ helm upgrade -f values.yaml test-app tarantool/cartridge --namespace tarantool --version 0.0.8
1047
+ ---
1048
+ Release "test-app" has been upgraded. Happy Helming!
1049
+ NAME: test-app
1050
+ LAST DEPLOYED: Tue Mar 2 14:42:06 2021
1051
+ NAMESPACE: tarantool
1052
+ STATUS: deployed
1053
+ REVISION: 3
1054
+
1055
+ After a while, new pods will be up and configured.
1056
+ The pod whose data was deleted may get stuck in the ``unconfigured ``
1057
+ state. If this happens, try to restart it:
1058
+
1059
+ .. code-block :: console
1060
+
1061
+ $ kubectl -n tarantool delete pod storage-1-2
1062
+ ---
1063
+ pod "storage-1-2" deleted
1064
+
1065
+ Why does it work? The Tarantool operator does not expel nodes from the cluster,
1066
+ but only "shuts them down". Therefore, it is impossible to reduce the
1067
+ number of replicas in this way. But you can recreate it, since the UID
1068
+ of each instance is generated based on its name, for example ``storage-1-2 ``.
1069
+ This ensures that the new instance with the given name replaces the old one.
1070
+
1071
+ This method is recommended only when there is no other way.
1072
+ It has its own limitations:
1073
+
1074
+ - Restarting nodes is possible only in descending order of the number in the replicaset.
1075
+ If you have a replicaset with ``node-0-0 ``, ``node-0-1 ``, ``node-0-2 ``, and ``node-0-3 ``,
1076
+ and you want to recreate only ``node-0-1 ``, then the nodes ``node-0-1 ``, ``node-0-2 ``,
1077
+ and ``node-0-3 `` will also restart with it.
1078
+ - All nodes that belong to the selected role will be restarted.
1079
+ It isn't possible to select a specific replicaset and only restart its instances.
1080
+ - If the replicaset leader number is more than the number of restarted replica,
1081
+ restarting can stop the leader.
1082
+ It will make the replicaset unable to receive new write requests.
1083
+ Please be very careful with reconnecting replicas.
1084
+
1085
+
927
1086
.. _cartridge_kubernetes_customization :
928
1087
929
1088
--------------------------------------------------------------------------------
0 commit comments