Architecture Best Practices Overview FAQ
These questions relate to the standard processes our Solutions Architecture team executes when engaged for Architecture review of a customer's environment.
Why go through an architecture or sizing process?
To measure - when possible - the workload in your environment and calculate the number of engines needed
- To ensure best practices are applied so your engine(s) perform to the level expected
- To find and avoid or work around potential challenges in an environment
- To maximize your license value
Why does Delphix collect IO data?
The collection of IO data allows us to more accurately size the Delphix Engine rather than making recommendations based on assumptions or synthetic data. The data you provide for analyses is from the SOURCE and TARGET DB servers.
Why is the Customer Database and Inventory (CDI) important?
The database information is typically the fundamental building block upon which we build our understanding of your future Delphix environment. We ask questions related to the database names, locations, platforms, and versions, as well as estimates of size, throughput, and changes.
The answers you provide to our infrastructure questions give us insight into potential challenges that may arise as we seek to integrate our application and best practices into your environment. Processes and software you use normally may not integrate optimally into our engine, and it’s helpful to know this as soon as possible.
Why does Delphix use a Virtual Machine? Shouldn’t all high-performance applications use physical infrastructure?
Using virtualization not only allows our customers to use their choice of commodity hardware, but it also allows us to focus on core features rather than hardware support.
Architecture Best Practices for Hypervisor Host and VM Guest - ESX FAQ
Why does Delphix recommend 8 vCPUs and 128GB of memory per 8 vCPUs?
8 vCPUs are not only a standard licensing block, but they are also key to meeting our 10Gbps single-engine throughput potential and help to sustain low latency for VDBs.
As with CPU, cache memory is required to drive peak loads on the Delphix Engine. More memory allows for more blocks to be read from the cache rather than going to less performant disks. Delphix stores cached data in a compressed format and only keeps a single copy of unique blocks in memory. These features give read performance across multiple VDBs provisioned from a single source dramatic improvements in speed, scalability, and memory utilization.
Why does Delphix request reservations for CPU and memory?
Delphix performance can be greatly impacted there is contention over CPU or RAM. Reservations allow the engine to explicitly control those resources and avoids the possibility of contention with other VMs, even when resources are overcommitted.
Why does Delphix request Hyper-Threading (HT) be disabled?
Hyper-Threading can have a positive impact on many, but not all applications. Delphix has a different execution profile that does not benefit from Hyper-Threading. Factors such as significant memory bandwidth requirements and a high level of parallelism which requires a high number of shared locks mean that HT and Delphix do not generally work well together.
Why does Delphix request 4 controllers, and why must the storage be identical between them?
To provide optimal storage performance, you must spread data equally over the maximum (4) virtual SCSI controllers. To provide consistent performance between each of the four controllers, you need to ensure storage is identical between them.
Why must virtual disks (VMDKs) be thick provisioned and eager-zeroed?
Thick provisioning and eager zero ensure performance is top-notch from the start with no hiccups from expanding virtual resources.
Why is 20% free space required?
While the ZFS file system has a lot of features leveraged by Delphix, it loses efficiency as space decreases. 20% is the minimum that must be available for best performance.
Why does Delphix request you reserve CPU and RAM for Hypervisor overhead?
Based on VMWare’s resource management guide and our own experiences with high IO throughput. Note there is no specific mechanism to assign resources to the hypervisor, the only way to preserve overhead is by not allocating resources to guests.
Why does Delphix generally want VMWare HA enabled, but DRS disabled?
VMware HA (High Availability) addresses outages that occur when a physical host goes down or is completely offline, by migrating the guest(s) to another physical host and restarting them. There is no real downside, it simply brings unavailable servers back online.
VMware DRS (Distributed Resource Scheduler) is for load balancing host resources in a cluster. Because of high IO and best practices configuration for optimal performance, our engine is typically not a good candidate for relocation.
Why does Delphix request you set power management to High-Performance Mode?
This will ensure power management will never impact performance by entering into a lower power state (also known as c-state).
Architecture Best Practices for Storage FAQ
Why does Delphix require 127GB of storage for the OS?
The system partition requires space to store and operate the OS, as well as application logs, upgrade and rollback images, and enough free space to store a kernel or application core dump should it be required.
Why does Delphix require our LUNS to be uniform and contain an equal quantity and capacity of VMDKs, yet thin provisioning is OK?
Because our engine leverages parallel reads, we need the storage capacity and quantity of disks they hold to be consistent. This allows the reads and writes to be evenly distributed, and minimizes the impact of potential utilization imbalances which would create a “long tail” of higher latency on a single controller, impacting the entire engine.
Data storage LUNS are generally formatted with a VMFS file system and have placed upon them a virtual disk (VMDK) which is thick provisioned and eager zeroed, so it would be a waste of time to thick provision the LUN also.
Why does Delphix require < 10ms latency (95th percentile) storage?
Storage latency is especially important in database environments. Average latency doesn’t give a complete picture of responsiveness, especially because Delphix leverages parallel reads; so inconsistent performance (e.g. good average latency but a “long tail”) can impact multiple operations. This is why Delphix has a focus on 95th percentile latency, and why we validate storage performance as the first step when a new engine is deployed.
For more information, you may find the following article helpful: The Cost of Latency.
Why does Delphix prefer to extend existing storage rather than simply add more while maintaining equal distribution?
While it is possible to add more storage and maintain the practice that "storage should be equal across controllers" – extending LUNS (then virtual disks) ensures that:
We do not continue to fill disks which may be full
Existing disks do not suffer a write performance penalty from low capacity
Storage performance is consistent
Architecture Best Practices for Network FAQ
Why does Delphix request 10GE Ethernet?
As a matter of physics and standards - 10 gigabit (Gb) Ethernet can sustain approximately 1 gigabyte (GB) per second of throughput. With all our best practices applied, a Delphix Engine can achieve very close to that line speed, allowing for optimal load, engine, and license utilization. Lower network speeds may be acceptable for low loads, while in some environments NIC teaming (e.g. LACP) may be required for top speeds.
Why does Delphix require < 1ms latency to TARGET servers and < 50ms to SOURCE servers?
Delphix leverages NFS and iSCSI (depending on platform) for live TARGET DB mounting over the network, so it’s imperative that latency is as low as possible. Data coming from SOURCE servers is not generally as time-sensitive, so you need a minimum latency of < 50ms to ensure operational integrity.
Why does Delphix request Jumbo Frames?
Jumbo frames increase the Ethernet maximum transmission unit (MTU) from the default 1500 bytes to 9000 bytes. This has several effects such as decreasing CPU cycles by transferring fewer packets and increasing the engine throughput. You will find jumbo frames have a 10-20% real-world impact and are required (along with all other best practices) to handle peak loads of 800 - 1000MB/s on an 8 vCPU engine with a 10Gb network.
How does Delphix avoid communication impact with non-jumbo frame hosts when Jumbo Frames are enabled on the Delphix Host?
Path MTU Discovery is the mechanism by which two hosts agree on the MTU leveraged for communication between them. This mechanism will ensure communication between both standard and Jumbo Frame enabled hosts works as expected.
When does Delphix recommend NIC teaming?
The Delphix Engine is capable of high throughput, but not every enterprise has sufficient network bandwidth to support it. Teaming is a less expensive way of increasing the bandwidth when compared to new hardware.
Why does Delphix recommend logical and physical and co-location?
The Delphix Engine leverages network connections extensively, so optimizing the latency whenever possible is very important - sometimes critical.
Architecture Best Practices for Delphix Engine Data Protection FAQ
Why does Delphix recommend SAN snapshots or Delphix replication for backup?
There are a few possible methods for data protection of the Delphix Engine. Those methods are SAN snapshots, Delphix replication, and virtual machine snapshots (for very small engines only). Because the Delphix Engine is itself a backup of source environments, many customers simply plan to rebuild in the event of a disaster.
What is the DXToolkit, and how can it help?
The professional services team has created a Perl-based “DXToolkit” which can help export and import certain configuration data over web services. This toolkit can be leveraged to assist with would normally be a manual re-install outside of the above methods.
For further detail around data protection, please speak with your Delphix contact.
Can I use a VMware-based backup solution such as VEEAM to backup my Delphix Engine?
Yes, VMWare backup solutions are useful for backing up guest VMs. However, Delphix suggests that you only use this approach for Delphix Engines which have a smaller storage footprint (perhaps < 2 TB) and are less active.
Running this type of backup puts a load on the environment, which might adversely impact Delphix VM performance.
Can I use a VMware snapshot for backing up Delphix for a small window – for example, during an engine upgrade?
Yes. However, even though snapshots are instantaneous, they track changes separately from the base disks and can grow to consume as much space as the original.
Upgrades, in particular, can change substantial amounts of data.
If you lose physical disks, snapshots are useless because it needs them to make up the current state of VM.
A Delphix Engine is often allocated multiple terabytes of storage and is often very busy due to load aggregation from virtual databases on multiple target servers, so this approach may be challenging.
Snapshots cannot detect storage corruption.
Can I use a Storage snapshot solution to protect Delphix against Storage and Delphix corruption?
Yes. However, please note that the caveats which apply to VMWare snapshots will also apply here.
A specific concern related to storage layer snapshots is that you must create a consistency group that contains both the OS and Data disks.
Can I use RMAN to backup my VDBs just like a Physical database to provide extra protection?
You can backup Delphix VDBs using Oracle RMAN tools, but the recovery database would first require re-hydration of that VDB, which might take up equivalent production storage space.
Furthermore, that re-hydrated database needs to be brought into the Delphix framework as a dSource, after which you can provision a VDB to complete recovery. The whole process might take hours or days to recover.
The best approach is to use the VDB Snapshot capability to backup VDB frequently and then leverage Delphix Replication capability to protect underlying Delphix storage, which holds that VDB snapshot.