An integral part of the data masking process is to use algorithms to mask each data element. You specify which algorithm to use on each individual data element (domain) on the Masking's tab. There, you define a unique domain for each element and then associate the classification and algorithm you want to use for each domain. Use the Algorithm settings tab to create or delete algorithms.
Algorithm Settings Tab
- All algorithm values are stored encrypted. These values are only decrypted during the masking process.
Algorithm Settings Tab
Adding New Delphix Masking Engine Algorithms
The Delphix Masking Engine Algorithm Frameworks give you the ability to quickly and easily define the algorithms you want, directly on the Settings page. Then, you can immediately propagate them. Anyone in your organization who has the Delphix Masking Engine can then access the information.
Administrators can update system-defined algorithms. User-defined algorithms can be accessed by all users and updated by the owner/user who created the algorithm.
To add an algorithm:
- In the upper right-hand corner of the Algorithm settings tab, click Add Algorithm.
2. Select an algorithm type.
3. Complete the form to the right to name and describe your new algorithm.
4. Click Save.
Choosing an Algorithm Type
The Delphix Masking Engine offers 35 individual algorithms from which to choose, so you can mask data according to your specific needs. Each algorithm is built using one of eight frameworks, or algorithm types. The descriptions below will help you select which algorithm type is appropriate for the way that you want to mask data. They appear in order of their popularity.
Secure Lookup Algorithm
To add a secure lookup algorithm:
- In the upper right-hand corner of the Algorithm tab, click Add Algorithm.
- Choose Secure Lookup Algorithm. The Create SL Algorithm pane appears.
Enter a Algorithm Name.
This name MUST be unique.
- Enter a Description.
Specify a Lookup File.
This file is a single list of values. It does not require a header. Make sure there are no spaces or returns at the end of the last line in the file. The following is sample file content:Example Lookup File
Smallville Clarkville Farmville Townville Cityname Citytown Towneaster
- When you are finished, click Save.
- Before you can use the algorithm (specify it in a profiling or masking job), you must add it to a domain.
Note
The masking engine supports lookup files saved in ASCII or UTF-8 format only. If the lookup file contains foreign alphabet characters, the file must be saved in UTF-8 format with no BOM (Byte Order Marker) for Masking Engine to read the Unicode text correctly. Some applications, e.g. Notepad on Windows, write a BOM (Byte Order Marker) at the beginning of Unicode files which irritates the masking engine and will lead to SQL update or insert errors when trying to run a masking job that applies a Secure Lookup algorithm that has been created based on a UTF-8 file that included a BOM.
Segment Mapping Algorithm
You can mask up to a maximum of 36 values using segment mapping. You might use this method if you need columns with unique values, such as Social Security Numbers, primary key columns, or foreign key columns. When using segment mapping algorithms for primary and foreign keys, in order to make sure they match, you must use the same segment mapping algorithm for each. You can set the algorithm to produce alphanumeric results (letters and numbers) or only numbers.
With segment mapping, you can set the algorithm to ignore specific characters. For example, you can choose to ignore dashes [-] so that the same Social Security Number will be identified no matter how it is formatted. You can also preserve certain values. For example, to increase the randomness of masked values, you can preserve a single number such as 5 wherever it occurs. Or if you want to leave some information unmasked, such as the last four digits of Social Security numbers, you can preserve that information.
Segment Mapping Example
NM831026-04
Where:
- NM is a plan code number that you want to preserve, always a two-character alphanumeric code.
- 831026 is the uniquely identifiable account number. To ensure that you do not inadvertently create actual account numbers, you can replace the first two digits with a sequence that never appears in your account numbers in that location. (For example, you can replace the first two digits with 98 because 98 is never used as the first two digits of an account number.) To do that, you want to split these six digits into two segments.
- -04 is a location code. You want to preserve the hyphen and you can replace the two digits with a number within a range (in this case, a range of 1 to 77).
Procedure for Defining Segments
- Choose 3 for No. of Segment. Remember, you do NOT count the segment(s) you want to preserve.
- Preserve the first two characters ("NM" in the sample value). Under Preserve Original Values:
- For Starting position, enter 1.
- For Length, enter 2.
- Define the next two-digit segment ("83" in sample value) to always be 98 or 99.
- For Segment 1, select Type > Numeric.
- For Length, select 2.
- For Mask Values Range#, specify 98,99.
- Define the next four-digit segment ("1026" in sample value).
- For Segment 2, select Type > Numeric.
- For Length, select 4.
- Leave range fields empty.
- Click Add to the right of Preserve Original Values.
- Preserve the hyphen.
- For Starting position, enter 9.
- For Length, enter 1.
- Define the last two-digit segment ("04" in sample value).
- For Segment 3, select Type > Numeric.
- For Length, select 2.
- For Mask Values Min#, enter 1.
- For Mask Values Max#, enter 77.
The sample value NM831026-04 might be masked to NM981291-77.
Segment Mapping Procedure
- In the upper right-hand region of the Algorithm tab, click Add Algorithm.
- Select Segment Mapping Algorithm. The Create Segment Mapping Algorithm pane appears.
- Enter a Rule Name.
- Enter a Description.
From the No. of Segment drop-down menu, select how many segments you want to mask.
This number does NOT include the values you want to preserve.
The minimum number of segments is 2; the maximum is 9.
A box appears for each segment.For each segment, choose the Type of segment from the dropdown: Numeric or Alphanumeric.
Numeric segments are masked as whole segments. Alphanumeric segments are masked by individual character.
- For each segment, select its Length (number of characters) from the drop-down menu. The maximum is 4.
- Optionally, for each segment, specify range values. You might need to specify range values to satisfy particular application requirements, for example. See details below.
- Preserve Original Values by entering Starting position and length values. (Position starts at 1.) For example, to preserve the second, third, and fourth values, enter Starting position 2 and length 3.
- If you need additional value fields, click Add.
- When you are finished, click Save.
- Before you can use the algorithm (specify it in a profiling or masking job), you must add it to a domain. If you are not using the Masking Engine Profiler to create your inventory, you do not need to associate the algorithm with a domain.
Specifying Range Values
You can specify ranges for Real Values and Mask Values. With Real Values ranges, you can specify all the possible real values to map to the ranges of masked values. Any values NOT listed in the Real Values ranges would then mask to themselves.
Specifying range values is optional. If you need unique values (for example, masking a unique key column), you MUST leave the range values blank. If you plan to certify your data, you must specify range values.
When determining a numeric or alphanumeric range, remember that a narrow range will likely generate duplicate values, which will cause your job to fail.
- To ignore specific characters, enter one or more characters in the Ignore Character List box. Separate values with a comma.
- To ignore the comma character (,), select the Ignore comma (,) check box.
- To ignore control characters, select Add Control Characters.
The Add Control Characters window appears. - Select the individual control characters that you would like to ignore, or choose Select All or Select None.
- When you are finished, click Save.
You are returned to the Segment Mapping pane.
Numeric segment type
- Min# — A number; the first value in the range. Value can be 1 digit or up to the length of the segment. For example, for a 3-digit segment, you can specify 1, 2, or 3 digits. Acceptable characters: 0-9.
- Max# — A number; the last value in the range. Value should be the same length as the segment. For example, for a 3-digit segment, you should specify 3 digits. Acceptable characters: 0-9.
Range# — A range of numbers; separate values in this field with a comma (,). Value should be the same length as the segment. For example, for a 3-digit segment, you should specify 3 digits. Acceptable characters: 0-9.
If you do not specify a range, the Masking Engine uses the full range. For example, for a 4-digit segment, the Masking Engine uses 0-9999.
Alphanumeric segment type
- Min# — A number from 0 to 9; the first value in the range.
- Max# — A number from 0 to 9; the last value in the range.
- MinChar — A letter from A to Z; the first value in the range.
- MaxChar — A letter from A to Z; the last value in the range.
Range# — A range of alphanumeric characters; separate values in this field with a comma (,). Individual values can be a number from 0 to 9 or an uppercase letter from A to Z. (For example, B,C,J,K,Y,Z or AB,DE.)
If you do not specify a range, the Masking Engine uses the full range (A-Z, 0-9). If you do not know the format of the input, leave the range fields empty. If you know the format of the input (for example, always alphanumeric followed by numeric), you can enter range values such as A2 and S9.
Mapping Algorithm
You can use a mapping algorithm on any set of values, of any length, but you must know how many values you plan to mask. You must supply AT MINIMUM the same number of values as the number of unique values you are masking; more is acceptable. For example, if there are 10,000 unique values in the column you are masking you must give the mapping algorithm AT LEAST 10,000 values.
When you use a mapping algorithm, you cannot mask more than one table at a time. You must mask tables serially.
To add a mapping algorithm:
- In the upper right-hand corner of the Algorithm tab, click Add Algorithm.
- Select Mapping Algorithm.
The Create Mapping Algorithm pane appears. - Enter a Rule Name. This name MUST be unique.
- Enter a Description.
Specify a Lookup File (.txt){*}.
The value file must have NO header. Make sure there are no spaces or returns at the end of the last line in the file. The following is sample file content. Notice that there is no header and only a list of values.Smallville Clarkville Farmville Townville Cityname Citytown Towneaster
- To ignore specific characters, enter one or more characters in the Ignore Character List box. Separate values with a comma.
- To ignore the comma character (,), select the Ignore comma (,) check box.
- When you are finished, click Save.
Before you can use the algorithm by specifying it in a profiling or masking job, you must add it to a domain. If you are not using the Masking Engine Profiler to create your inventory, you do not need to associate the algorithm with a domain.
See Adding New Domains.
Binary Lookup Algorithm
To add a binary lookup algorithm:
- At the top right of the Algorithm tab, click Add Algorithm.
- Select Binary Lookup Algorithm.
The Binary SL Rule pane appears. - Enter a Rule Name.
- Enter a Description.
- Select a Binary Lookup File on your filesystem.
- Click Save.
Tokenization Algorithm
Like mapping, a tokenization algorithm creates a unique token for each input such as “David” or “Melissa.” The actual data (for example, names and addresses) are converted into tokens that have similar properties to the original data – such as text and length – but no longer convey any meaning. The Delphix Masking Engine stores both the token and the original so that you can reverse masking later.
To add a Tokenization algorithm:
- Enter algorithm Name.
- Enter a Description.
- Click Save.
Once you have created an algorithm, you will need to associate it with a domain.
- Navigate to the Home>Settings>Domains page and click Add Domain. You will see the popup below:
- Enter a domain name.
- From the Tokenization Algorithm Name drop-down menu, select your algorithm.
Create a Tokenization Environment
- On the home page, click Environments.
- Click Add Environment.
- For Purpose, select Tokenize/Re-Identify.
Click Save.
This environment will be used to re-identify your data when required.
- Set up a Tokenize job using tokenization method. Execute the job.
Here is a snapshot of the data before and after Tokenization to give you an idea of what the it will look like.
Before Tokenization
After Tokenization
Min Max Algorithm
If the Out of range Replacement Values checkbox is selected, a default value is used when the input cannot be evaluated.
- Enter the Algorithm Name.
- Enter a Description.
- Enter Min Value and Max Value.
- Click Out of range Replacement Values.
- Click Save.
Example: Age less than 18 years - enter Min Value 0 and Max Value 18
Data Cleansing Algorithm
- Enter Algorithm Name.
- Enter a Description.
- Select Lookup File location.
- Enter default Delimiter. Key and Value separator is =. You can change this to match the lookup file.
- Click Save.
Below is an example of a lookup input file. It does not require a header. Make sure there are no spaces or returns at the end of the last line in the file. The following is sample file content:
Example Lookup File
NYC=NY NY City=NY New York=NY Manhattan=NY
Free Text Algorithm
One challenge is that individual words might not be sensitive on their own, but together they can be. The algorithm uses profiler sets to determine what information it needs to mask. You can decide which expressions the algorithm uses to search for material such as addresses. For example, you can set the algorithm to look for “St,” “Cir,” “Blvd,” and other words that suggest an address. You can also use pattern matching to identify potentially sensitive information. For example, a number that takes the form 123-45-6789 is likely to be a Social Security Number.
You can use a free text redaction algorithm to show or hide information by displaying either a “black list” or a “white list.”
Blacklist – Designated material will be redacted (removed). For example, you can set a black list to hide patient names and addresses. The blacklist feature will match the data in the lookup file to the input file.
Whitelist – ONLY designated material will be visible. For example, if a drug company wants to assess how often a particular drug is being prescribed, you can use a white list so that only the name of the drug will appear in the notes. The whitelist feature enables you to mask data using both the lookup file and a profile set.
For either option, a list of words can be imported from an external text file or alternatively, you can use Profiler Sets to match words based on regular expressions, defined within Profiler Expressions. You can also specify the redaction value that will replace the masked words. Regular expressions defined using Profiler Sets will match individual words within the input text, rather than phrases.
- Enter Algorithm Name.
- Enter a Description.
- Select the Black List or White List radio button.
- Select Lookup File and enter Redaction Value OR/AND
Select Profiler Sets from the drop-down menu and enter Redaction Value. - Click Save.
Free Text Redaction Example
- Create Input File.
- Create input file using notepad. Enter the following text:
"The customer Bob Jones is satisfied with the terms of the sales agreement. Please call to confirm at 718-223-7896." - Save file as txt.
- Create look up file.
- Create a lookup file.
- Use notepad to create a txt file and save the file as a TXT. Be sure to hit return after each field. The lookup flat file contains the following data:
Bob
Jones
Agreement
Create an Algorithm
You will be prompted for the following information:
- For Algorithm Name, enter Blacklist_Test1.
- For Description, enter Blacklist Test.
- Select the Black List radio button.
- Select LookUp File.
- Enter redaction value XXXX.
- Click Save.
Create Rule Set
- From the job page go to Rule Set and Click Create Rule Set.
- For Rule Set Name, enter Free_ Text_RS.
- From the Connector drop-down menu, select Free Text.
- Select the Input File by clicking the box next to your input file
- Click Save.
Create Masking Job
- Use Free_Texr Rule Set
- Execute Masking job.
The results of the masking job will show the following:
Redacted Input File: The customer xxxx xxxx is satisfied with the terms of the sales xxxx. Please call to confirm at 718-223-7896.
"Bob," "Jones," and "agreement" are redacted.