Manage your ZDM Proxy instances
After you deploy ZDM Proxy instances, you might need to perform various management operations, such as rolling restarts, configuration changes, log inspection, version upgrades, and infrastructure changes.
If you are using ZDM Proxy Automation, you can use Ansible playbooks for all of these operations.
Perform a rolling restart of the proxies
Rolling restarts of the ZDM Proxy instances are useful to apply configuration changes or to upgrade the ZDM Proxy version without impacting the availability of the deployment.
|
A rolling restart is a destructive action because it stops the previous containers, and then starts new containers. Collect the logs before you apply the configuration change if you want to keep them. |
-
With ZDM Proxy Automation
-
Without ZDM Proxy Automation
If you use ZDM Proxy Automation to manage your ZDM Proxy deployment, you can use a dedicated playbook to perform rolling restarts of all ZDM Proxy instances in a deployment:
-
Connect to your Ansible Control Host container.
For example,
sshinto the jumphost:ssh -F ~/.ssh/zdm_ssh_config jumphostThen, connect to the Ansible Control Host container:
docker exec -it zdm-ansible-container bashResult
ubuntu@52772568517c:~$ -
Run the rolling restart playbook:
ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory
The rolling restart playbook recreates each ZDM Proxy container, one by one. The ZDM Proxy deployment remains available at all times, and you can safely use it throughout this operation. If you modified mutable configuration variables, the new containers use the updated configuration files.
The playbook performs the following actions automatically:
-
ZDM Proxy Automation stops one container gracefully, and then waits for it to shut down.
-
ZDM Proxy Automation recreates the container, and then starts it.
-
ZDM Proxy Automation calls the readiness endpoint to check the container’s status:
-
If the status check fails, ZDM Proxy Automation repeats the check up to six times at 5-second intervals. If all six attempts fail, ZDM Proxy Automation interrupts the entire rolling restart process.
-
If the check succeeds, ZDM Proxy Automation waits a fixed amount of time, and then moves on to the next container. The default pause between containers is 10 seconds. You can change the pause duration in
zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml.
-
If you don’t use ZDM Proxy Automation, you must manually restart each instance.
To avoid downtime, wait for each instance to fully restart and begin receiving traffic before restarting the next instance.
Inspect ZDM Proxy logs
ZDM Proxy logs can help you verify that your ZDM Proxy instances are operating normally, investigate how processes are executed, and troubleshoot issues. For information about configuring, retrieving, and interpreting ZDM Proxy logs, see Viewing and interpreting ZDM Proxy logs.
Change mutable configuration variables
Some, but not all, configuration variables can be changed after you deploy a ZDM Proxy instance.
This section lists the mutable configuration variables that you can change on an existing ZDM Proxy deployment.
After you edit mutable variables in their corresponding configuration files (vars/zdm_proxy_core_config.yml, vars/zdm_proxy_cluster_config.yml, or vars/zdm_proxy_advanced_config.yml), you must perform a rolling restart to apply the configuration changes to your ZDM Proxy instances.
Mutable variables in vars/zdm_proxy_core_config.yml
-
primary_cluster: Determines which cluster is currently considered the primary cluster, eitherORIGINorTARGET.At the start of the migration, the primary cluster is the origin cluster because it contains all of the data. After all the existing data has been transferred and validated/reconciled on the new cluster, you can switch the primary cluster to the target cluster.
-
read_mode: Determines how reads are handled by ZDM Proxy:-
PRIMARY_ONLY(default): Reads are sent synchronously to the primary cluster only. -
DUAL_ASYNC_ON_SECONDARY: Reads are sent synchronously to the primary cluster, and also asynchronously to the secondary cluster. See Phase 3: Enable asynchronous dual reads.Typically, you only set
read_modetoDUAL_ASYNC_ON_SECONDARYif theprimary_clustervariable is set toORIGIN. This is because asynchronous dual reads are primarily intended to help test production workloads against the target cluster near the end of the migration. When you are ready to switchprimary_clustertoTARGET, revertread_modetoPRIMARY_ONLYbecause there is no need to send writes to both clusters at that point in the migration.
-
-
log_level: Set the ZDM Proxy log level asINFO(default) orDEBUG.Only use
DEBUGwhile temporarily troubleshooting an issue. Revert toINFOas soon as possible because the extra logging can impact performance slightly.For more information, see Check ZDM Proxy logs.
Mutable variables in vars/zdm_proxy_cluster_config.yml
-
Origin username and password
-
Target username and password
Mutable variables in vars/zdm_proxy_advanced_config.yml
-
zdm_proxy_max_clients_connections: The maximum number of client connections that ZDM Proxy can accept. Each client connection results in additional cluster connections and causes the allocation of several in-memory structures. A high number of client connections per proxy instance can cause performance degradation, especially at high throughput. Adjust this variable to limit the total number of connections on each instance.Default:
1000 -
replace_cql_functions: Whether ZDM Proxy replaces standardnow()CQL function calls in write requests with an explicit timeUUID value computed at proxy level.If
false(default), replacement ofnow()is disabled. Iftrue, ZDM Proxy replaces instances ofnow()in write requests with an explicit timeUUID value before sending the write to each cluster.Enabling
replace_cql_functionshas a noticeable performance impact because the proxy must do more extensive parsing and manipulation of the statements before sending the modified statement to each cluster. Only enable this variable if required, and implement proper performance testing to quantify and prepare for the performance impact.If you use
now()to populate a regular (non-primary key) column, consider if you can pragmatically accept a slight discrepancy in the values between the origin and target cluster for these writes. This depends on your application, and whether it can tolerate a potential difference of a few milliseconds.However, if you use
now()to populate a primary key column, differences between the origin and target values result in different primary keys. This means that the same row on the origin and target are technically considered different records, and this will cause problems with duplicate entries that aren’t caught by validation (because the primary keys are different). Ifnow()is used in any of your primary key columns, DataStax recommends enablingreplace_cql_functions, regardless of the performance impact.For more information, see Server-side non-deterministic functions in the primary key.
-
zdm_proxy_request_timeout_ms: Global timeout in milliseconds of a request at proxy level. Determines how long ZDM Proxy waits for one cluster (for reads) or both clusters (for writes) to reply to a request. Upon reaching the timeout limit, ZDM Proxy abandons the request and no longer considers it pending, which frees up internal resources to processes other requests.When a request is abandoned due to a timeout, ZDM Proxy doesn’t return any result or error. A timeout warning or error is only returned when the client application’s own timeout is reached and the request is expired on the driver side.
Make sure
zdm_proxy_request_timeout_msis always greater than your client application’s client-side timeout. If the client has an especially high timeout because it routinely generates long-running requests, you must increase thezdm_proxy_request_timeout_mstimeout accordingly so that the ZDM Proxy doesn’t abandon requests prematurely.Default:
10000 -
origin_connection_timeout_msandtarget_connection_timeout_ms: Timeout in milliseconds for establishing a connection from the proxy to the origin or target cluster, respectively.Default:
30000 -
async_handshake_timeout_ms: Timeout in milliseconds for the initialization (handshake) of the connection that is used solely for asynchronous dual reads between the proxy and the secondary cluster.Upon reaching the timeout limit, the asynchronous reads aren’t sent because the connection failed to be established. This has no impact on the handling of synchronous requests: ZDM Proxy continues to handle all synchronous reads and writes as normal against the primary cluster.
Default:
4000 -
heartbeat_interval_ms: The interval in milliseconds that heartbeats are sent to keep idle cluster connections alive. This includes all control and request connections to the origin and the target clusters.Default:
30000 -
metrics_enabled: Whether to enable metrics collection.If
false, ZDM Proxy metrics collection is completely disabled. This isn’t recommended.Default:
true(enabled)
-
zdm_proxy_max_stream_ids: Set the maximum pool size of available stream IDs managed by ZDM Proxy per client connection. Use the same value as your driver’s maximum stream IDs configuration.In the CQL protocol, every request has a unique stream ID. However, if there are a lot of requests in a given amount of time, errors can occur due to stream ID exhaustion.
In the client application, the stream IDs are managed internally by the driver, and, in most drivers, the max number is 2048, which is the same default value used by ZDM Proxy. If you have a custom driver configuration with a higher value, make sure
zdm_proxy_max_stream_idsmatches your driver’s maximum stream IDs.Default:
2048
Deprecated mutable variables
Deprecated variables will be removed in a future ZDM Proxy release. Replace them with their recommended alternatives as soon as possible.
-
forward_client_credentials_to_origin: Whether to use the credentials provided by the client application to connect to the origin cluster. Iffalse(default), the credentials from the client application were used to connect to the target cluster. Iftrue, the credentials from the client application were used to connect to the origin cluster.This deprecated variable is no longer functional. Instead, the expected credentials are based on the authentication requirements of the origin and target clusters. For more information, see Client application credentials.
Change immutable configuration variables
All configuration variables not listed in Change mutable configuration variables are immutable and can only be changed by recreating the deployment with the initial deployment playbook (deploy_zdm_proxy.yml):
ansible-playbook deploy_zdm_proxy.yml -i zdm_ansible_inventory
You can re-run the deployment playbook as many times as necessary. However, this playbook decommissions and recreates all ZDM Proxy instances simultaneously. This results in a brief period of time where the entire ZDM Proxy deployment is offline because no instances are available.
For more information, see Configuration changes aren’t applied by ZDM Proxy Automation.
Upgrade the proxy version
The same playbook that you use for configuration changes can also be used to upgrade the ZDM Proxy version in a rolling fashion. All containers are recreated with the given image version.
|
A version change is a destructive action because the rolling restart playbook removes the previous containers and their logs, replacing them with new containers using the new image. Collect the logs before you run the playbook if you want to keep them. |
To check your current ZDM Proxy version, see Check your ZDM Proxy version.
-
In
vars/zdm_proxy_container.yml, setzdm_proxy_imageto the desired tag. For available tags, see the ZDM Proxy Docker Hub repository.zdm_proxy_image: datastax/zdm-proxy:TAGFor example:
zdm_proxy_image: datastax/zdm-proxy:2.3.4 -
Perform a rolling restart to update all ZDM Proxy instances to the new version.
Scale ZDM Proxy instances
-
Scale with ZDM Proxy Automation
-
Scale without ZDM Proxy Automation
ZDM Proxy Automation doesn’t provide a way to scale operations up or down in a rolling fashion. If you are using ZDM Proxy Automation and you need a larger ZDM Proxy deployment, you can create a new deployment, or you can add instances to an existing deployment.
-
Create a new deployment (recommended)
-
Add instances to an existing deployment
This option is the recommended way to scale your ZDM Proxy deployment because it requires no downtime.
Create a new ZDM Proxy deployment, and then reconfigure your client application to use the new instance:
-
Create a new ZDM Proxy deployment with the desired topology on a new set of machines.
-
Change the contact points in the application configuration so that the application instances point to the new ZDM Proxy deployment.
-
Perform a rolling restart of the application instances to apply the new contact point configuration.
The rolling restart ensures there is no interruption of service. The application instances switch seamlessly from the old deployment to the new one, and they are able to serve requests immediately.
-
After restarting all application instances, you can safely remove the old ZDM Proxy deployment.
This option requires manual configuration and a small amount of downtime.
Change the topology of your existing ZDM Proxy deployment, and then restart the entire deployment to apply the change:
-
Amend the inventory file so that it contains one line for each machine where you want to deploy a ZDM Proxy instance.
For example, if you want to add three nodes to a deployment with six nodes, then the amended inventory file must contain nine total IPs, including the six existing IPs and the three new IPs.
-
Run the
deploy_zdm_proxy.ymlplaybook to apply the change and start the new instances:ansible-playbook deploy_zdm_proxy.yml -i zdm_ansible_inventoryRerunning the playbook stops the existing instances, destroys them, and then creates and starts a new deployment with new instances based on the amended inventory. This results in a brief interruption of service for your entire ZDM Proxy deployment.
If you aren’t using ZDM Proxy Automation, use these steps to add, change, or remove ZDM Proxy instances.
-
Add an instance
-
Vertically scale existing instances
-
Remove an instance
-
Prepare and configure the new ZDM Proxy instances appropriately based on your other instances.
Make sure the new instance’s configuration references all planned ZDM Proxy cluster nodes.
-
On all ZDM Proxy instances, add the new instance’s address to the
ZDM_PROXY_TOPOLOGY_ADDRESSESenvironment variable.Make sure to include all new nodes.
-
On the new ZDM Proxy instance, set the
ZDM_PROXY_TOPOLOGY_INDEXto the next sequential integer after the greatest one in your existing deployment. -
Perform a rolling restart of all ZDM Proxy instances, one at a time.
Use these steps to increase or decrease resources for existing ZDM Proxy instances, such as CPU or memory. To avoid downtime, perform the following steps on one instance at a time:
-
Stop the first ZDM Proxy instance that you want to modify.
-
Modify the instance’s resources as required.
Make sure the instance’s IP address remains the same. If the IP address changes, you must treat it as a new instance; follow the steps on the Add an instance tab.
-
Restart the modified ZDM Proxy instance.
-
Wait until the instance starts, and then confirm that it is receiving traffic.
-
Repeat these steps to modify each additional instance, one at a time.
-
On all ZDM Proxy instances, remove the unused instance’s address from the
ZDM_PROXY_TOPOLOGY_ADDRESSESenvironment variable. -
Perform a rolling restart of all remaining ZDM Proxy instances.
-
Clean up resources used by the removed instance, such as the container or VM.
Proxy topology addresses enable failover and high availability
When you configure a ZDM Proxy deployment, either through ZDM Proxy Automation or manually-managed ZDM Proxy instances, you specify the addresses of your instances.
These are populated in the ZDM_PROXY_TOPOLOGY_ADDRESSES variable, either manually or automatically depending on how you manage your instances.
Cassandra drivers look up nodes on a cluster by querying the system.peers table.
ZDM Proxy uses the topology addresses to effectively respond to the driver’s request for connection nodes.
If there are no topology addresses specified, ZDM Proxy defaults to a single-instance configuration.
This means that driver connections use only that one ZDM Proxy instance rather than all instances in your ZDM Proxy deployment.
If that one instance goes down, ZDM Proxy won’t know that there are other instances available, and your application can experience an outage. Additionally, if you need to restart ZDM Proxy instances, and there is only one instance specified in the topology addresses, your migration will have downtime while that one instance restarts.