Delphix allows you to replicate data objects between Delphix Engines. Previous to 5.3.3 these engines had to be running identical Delphix versions, but otherwise, they could be asymmetric in terms of engine configuration. With 5.3.3 engines can be on different versions, refer to the Forward Compatible Replication (FCR) for more details. In the event of a failure that destroys the source engine, you can bring up the target engine in a state matching that of the source. In addition, you can provision VDBs from replicated objects, allowing for the geographical distribution of data and remote provisioning.
You can run replication ad hoc, but it is typically run according to a predefined schedule. After the initial update, each subsequent update sends only the changes incurred since the previous update. Replication does not provide synchronous semantics, which would otherwise guarantee that all data is preserved on the target engine. When there is a failover to a replication target, some data is lost, equivalent to the last time a replication update was sent.
Replication is generally not suited for high-availability configurations where rapid failover (and failback) is a requirement. Failing over a replication target requires a non-trivial amount of time and is a one-way operation; to failback requires replicating all data back to the original source. For cases where high availability is necessary, it is best to leverage features of the underlying hypervisor or storage platform. For more information on how to evaluate the use of Delphix Engine replication for your data recovery requirements, see the topics under Backup and Recovery Strategies for the Delphix Engine.
Forward Compatible Replication (FCR)
With the 5.3.3 release, Delphix virtualization supports replication to a Delphix Target Engine running a higher version. In order to perform Forward Compatible Replication, there are some requirements to consider:
- FCR is supported for replication jobs starting from a source engine running 5.3.0.0 and beyond.
- The target engine must be running 5.3.3.0 and beyond.
- The target engine should be the same version or up to two major versions higher than the source.
Examples of supported and not supported FCR configurations:
Supported
- 5.3.0.0 to 5.3.0.0 (same version)
- 5.3.3.0 to 5.3.3.0 (same version)
- 5.3.0.0 to 5.3.3.0
Not Supported
- 5.2.5.0 to 5.3.3.0 (source version not compatible with FCR)
- 5.3.0.0 to 5.3.2.0 (target version not compatible with FCR)
- 5.3.4.0 to 5.3.3.0 (source version higher than the target)
Exceptions
Some of the newer versions of 5.3.x are not compatible with the early versions of 6.0.x.0 (6.0.0.0 and 6.0.1.0).
- 5.3.7.0 and 5.3.7.1 are not compatible with 6.0.0.0
- 5.3.8.0 and 5.3.8.1 are not compatible with 6.0.0.0
- 5.3.9.0 is not compatible with 6.0.0.0
- 5.3.9.0 is not compatible with 6.0.1.0 and 6.0.1.1
Replication Features
As virtual appliances, it is possible to backup, restore, replicate, and migrate data objects between Delphix Engines using features of VMWare and the underlying storage infrastructure. Data objects include groups, dSources, VDBs, Self-Service (Jet Stream) data templates and data containers, and associated dependencies. In addition to the replication capabilities provided by this infrastructure, native Delphix Engine replication provides further capabilities, such as the ability to replicate a subset of objects, replicate multiple sources to a single target, and provision VDBs from replicated dSources and VDBs without affecting ongoing updates. The topics under Backup and Recovery Strategies for the Delphix Engine provide more information on how to evaluate features of the Delphix Engine in relation to your backup and recovery requirements.
Replication is configured on the source Delphix Engine and copies a subset of dSources and VDBs to a target Delphix Engine. It then sends incremental updates manually or according to a schedule. For more information on configuring replication, see Configuring Replication.
You can use replicated dSources and VDBs to provision new VDBs on the target side. You can refresh these VDBs to data sent as part of an incremental replication update, as long as you do not destroy the parent object on the replication source. For more information, see Provisioning from Replicated Data Sources or VDBs.
During replication, replicated dSources and VDBs are maintained in an alternate replica and are not active on the target side. In the event of a disaster, a failover operation can break the replication relationship. For more information on how to activate replicated objects, see Replicas and Failover.
Replication Details
When you select objects for replication, the engine will automatically include any dependencies, including parent objects, such as groups, and data dependencies such as VDB sources. This means that replicating a VDB will automatically include its group, the parent dSource, and the group of the dSource, as well as any environments associated with those databases. When replicating an entire engine, all environments will be included. When replicating a database or group, only those environments with the replicated databases are included.
During replication, the Delphix Engine will negotiate an SSL connection with its server peer to use TLS_RSA_WITH_AES_256_CBC_SHA256 as the cipher suite and TLSv1.2 as the protocol.
Only database objects and their dependencies are copied as part of a backup or replication operation, including:
- dSources
- VDBs
- Groups
- Self-service (Jet Stream) Data Templates and Data Containers
- Environments
- Environment configuration (users, database instances, and installations)
The following objects are NOT copied as part of a backup or replication operation:
- Users and Roles
- Policies
- VDB (init.ora) configuration templates
- Events and faults
- Job history
- System services settings, such as SMTP
- Hook templates
- Alert profiles
After failover, you must recreate these settings on the target.
The retention policy is not replicated and objects that are failed over will inherit the default retention policy (1 week). As such, if the retention policy on the replication source was a longer duration this could cause the inadvertent removal of snapshots. It is recommended to check the default retention policy and adjust as necessary until the retention policies from the replication source can be added.
On-Premise Replication To Azure/OCI/GCP/Hyper-V
Replicating from on premises engines with an underlying storage block size of 512B will experience disk usage inflation when replicating to target engines with a different underlying block size. Azure, GCP, Hyper-V, and OCI are known to have 4K block sizes and therefore will require extra disk capacity when receiving replication from an on premises engine. This behavior is due to the fact that the underlying storage block size is different (512B vs. 4K) between the two Delphix Engines (one on Prem, one on Azure/OCI/GCP/Hyper-V), resulting in a lower compression rate on the replication target. It is expected that 1.5-1.6x the amount of space is taken from objects on premises in these cases.
Selective Data Distribution
Selective data distribution replication enhances the current replication feature by allowing you to restart large, time-consuming initial replication or incremental updates from an intermediate point. A single replication instance can fail for a number of environmental and internal reasons. Previously, when you restarted a failed replication instance, replication required a full resend of all data transmitted prior to the failure. With resumable replication, no data is retransmitted. Replication is resumable across machine reboot, stack restart, and network partitions. The resumable replication feature is fully automated and does not require or allow any user intervention.
For example, suppose a replication profile has already been configured from a source to a target. A large, full send begins between the two that is expected to take weeks to complete. Halfway through, a power outage at the data center that houses the source causes the source machine to go down and only come back up after a few hours. On startup, the source will detect a replication was ongoing, automatically re-contact the target, and resume the replication where it left off. In the user interface (UI) on the source, the same replication send job will appear as active and continue to update its progress. However, in the UI of the target, a new replication receive job will appear but will track its progress as a percentage of the entire replication.
In 4.1 and earlier releases, the replication component would always clean up after failed jobs to ensure that the Delphix Engine was kept in a consistent state and that no storage was wasted on unused data. With the addition of resumability, the target and source can choose to retain partial replication state following a failure to allow future replications to complete from that intermediate point. In the current release, the target and source will only choose to retain partial replication state following failures that leave them out of network contact with each other – for example, source restart, target restart, or network partition. Once network contact is re-established, the ongoing replication will be automatically detected and resumed.
Replication will not resume after failures that leave the source and target connected. For example, if a storage failure on the target, such as out-of-space errors, causes a replication to fail, the source and target remain connected. As a result, the Engine will discard state data associated with the failed Replication operation. Nonetheless, resumable replication would begin during a source reboot, a target reboot, and a network partition.
Replicating Delphix Self-Service Templates
Templates can now be replicated and accessed on the target engine via Delphix Self-Service (Jet Stream). Replicated templates can be replicated into the target space with or without their containers. On the new target engine, the newly created replicated template can be used to create new containers that are assigned to users. You cannot change the replicated template’s name or the names of the containers from which it was replicated.
Any containers that were replicated over with the template cannot be used to start, stop etc until they are disconnected to their parent containers in the source engine during the failover operation.