Best Practices
As of release 5.0, the virtualization and masking functions are combined into a single OVA and require additional consideration for installation and configuration. Additionally, support for remote Masking Engine calls has been implemented and is supported in 5.0.4 and above.
- Masking Engines should continue to be deployed to hosts dedicated to that function.
- Possible exceptions to this would be when any virtualization needed is extremely low and unlikely to be heavily impacted by masking requirements.
- The Masking Engine is disabled by default.
- The Delphix platform will continue to remain running (but unused) on a Masking-only VM.
- Standard configuration for a dedicated Masking Engine:
- 8 vCPUs
- 16GB RAM minimum, 32 GB RAM or more recommended.
- 300 GB storage for the OS / system root disk is required for the OS (5.1.4 and greater).
- 50 GB storage for the data disk must be added during initial configuration via the Engine Setup wizard. (the engine will not complete its setup without a separate data disk).
- If bulk operation is used, allocate extra space equivalent to the size of all datasets (tables) that will be masked concurrently.
- As a rule of thumb: Disk Space Required for Bulk = (( Total Database Size * .66 ) * .10) where Raw data = Total Database Size * .66
- 10% change is an estimate based on our experience for data we mask. Often it is lower but there are exceptions such as masking a data warehouse with a large fact table and a bunch of much smaller tables.
- The VMDK for the engine OS is often stored on the same VMFS volume as the VM definition file (aka VMX). In that case, the VMFS volume must have sufficient space to hold the VMX Configuration, the VDMK for the system disk, and any VMWare logging.
- Additional VMFS space for swap/paging is required if RAM reservations are not enabled. (The VM will not start if reservations are lacking and disk space is not available for swap)
CPU Utilization
- One vCPU per concurrent masking job is considered a best practice.
- Dependent on the algorithms used: Some are calculations such as ones using AES encryption and others are lookups and tend to do more I/O.
Memory Utilization
- The Delphix Masking Engine uses its memory to cache data. More memory will provide better performance. 1GB per masking job is considered a best practice.
- Dependent on memory settings in the Masking Engine and JVMs. Increase in parallel workloads will require more memory. Data is either cached directly or using Kettle so the larger the lookups for algorithms the more memory required. This is the first thing to look at for performance issues.
Network and I/O
Delphix Masking leverages the Target DB server and VDB for most of the workload. This means the masking engine can be I/O bound waiting for the DB server. As long as the masking engine can read the data faster than it can process it this is not an issue. Slow networks with numerous hops between the DB server and the Masking server can cause performance problems. Co-locating the masking server with the DB server is recommended in these cases.
Masking VDB Tuning
- Always start with the tuning recommendations for Target servers and VDBs first. If the VDB is not performing well, performance of masking will suffer.
- For Oracle, it is critical to select noarchivelog mode and tune online redo log size at provision time.
- For SQL Server, the VDB should be in SIMPLE recovery with appropriate log file and TempDB sizes.
Backup of a Masking Engine
- Virtual machine backups are recommended for versions of software in which masking runs in its own VM – in other words, the masking VM is separate from the VM(s) where virtualization takes place.
If an engine is supporting both masking and virtualization, review data protection best practices.
- Although XML exports of inventories and environments do exist, they are incomplete. Do not rely on them.
- In-Place (not On-the-Fly) Masking is the primary use case.
The following recommendations apply to Masking Engine versions 5.0.2 and earlier:
- Jobs vs. Streams:
- If there are multiple tables to be masked concurrently, use multiple, separate jobs – one per table.
Avoid multiple streams due to internal limitation.
The default setting for streams is 20; set it to 1. This will force serialization (one table at a time) if the job contains multiple tables.
Use one update thread per job; this avoids block collisions/contention during the UPDATE phase.
The default setting is 4; set it to 1.
- Identify ALL indexes, constraints and triggers on columns being masked (and only on columns being masked).
- Evaluate whether it is better to drop/mask/recreate for indexes, or disable/mask/re-enable for triggers and constraints – as compared to leaving in-place during masking. The best choice depends on the situation
- For Oracle VDBs, use ROWID for SQL UPDATE of masked row value(s).
- Edit the rule set, select Edit All Logical Keys, enter ROWID as the logical key value; see Managing Rule Sets. When a single, large, non-partitioned table must be masked by concurrent jobs (each masking a subset of table) to shorten masking elapsed time, segregate jobs by database block/page to avoid contention and locking conflicts. In other words, each job masks a unique set of blocks; each block is masked exclusively by one job.
- If a single table is being masked by multiple, concurrent Jobs, and Indexes/constraints/triggers must be dropped/recreated or disabled/enabled, these must be performed OUTSIDE of masking Jobs.
- Pre-masking and post-masking steps must be created manually.
- Scheduling of pre-script and post-script jobs must be devised. Plan to scheduled/execute externally.