Decommission a data center
Steps to properly remove a data center so that no information is lost.
Decommission an HCD data center
-
Verify that no clients are writing data to any nodes in the data center. The following JMX MBeans provide details on client connections and pending requests:
-
Active connections:
org.apache.cassandra.metrics/Client/connectedNativeClientsandorg.apache.cassandra.metrics/Client/connectedThriftClients -
Pending requests:
org.apache.cassandra.metrics/ClientRequests/viewPendingMutationsor usenodetool tpstats.
-
-
Run a full repair with 'nodetool repair --full' to ensure that all data is propagated from the data center being decommissioned.
-
Shut down the Mission Control Repair Service, if in use.
-
Change all keyspaces so they no longer reference the data center being decommissioned.
-
Shut down all nodes in the data center.
-
From a running node in a data center that is not being decommissioned, run 'nodetool assassinate' for each node in the data center that is being decommissioned:
nodetool assassinate <remote_IP_address>If the replication factor (RF) on any keyspace has not been properly updated, then:
-
Note the name of the keyspace that needs to be updated.
-
Remove the data center from the keyspace RF (using 'ALTER KEYSPACE').
-
For a keyspace with a simple strategy RF, run a full repair on the keyspace:
nodetool repair --full <keyspace_name>
-
-
Run
nodetool statusto ensure that the nodes in the data center are removed. -
If the Mission Control Repair Service is disabled, then re-enable it now.
Example
Removing DC3 from the cluster:
-
Check the status of the cluster:
nodetool statusStatus shows that there are three data centers with one node in each:
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1 Datacenter: DC3 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.111 461.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c -9223372036854775788 rack1
-
Run a full repair:
nodetool repair --full -
Using JConsole, check the following JMX Beans to make sure there are no active connections:
-
org.apache.cassandra.metrics/Client/connectedNativeClients -
org.apache.cassandra.metrics/Client/connectedThriftClients
-
-
Verify that there are no pending write requests on each node that is being removed (The
Pendingcolumn should read0orN/A):nodetool tpstatsPool Name Active Pending (w/Backpressure) Delayed Completed... BackgroundIoStage 0 0 (N/A) N/A 640... CompactionExecutor 0 0 (N/A) N/A 1039... GossipStage 0 0 (N/A) N/A 4580... HintsDispatcher 0 0 (N/A) N/A 2... -
Start
cqlshand decommissionDC3from all keyspace configurations. Repeat for each keyspace that has a RF set forDC3:alter keyspace cycling WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1':1,'DC2':2}; -
Shut down the Mission Control Repair Service, if in use.
-
Shut down all nodes in the data center.
-
From a running node in a data center that is not being decommissioned, run 'nodetool assassinate' for each node in the data center being decommissioned, (in this case,
DC3):nodetool assassinate <remote_IP_address> -
In a remaining data center, verify that the DC3 data center has been removed:
nodetool statusDatacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 503.54 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 522.47 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1