The Delphix Masking Engine profiler uses two different methods to identify the location of sensitive data:

  • At the metadata level — searches through the column names in the target database, by querying the database catalog, looking for specific words in column names (for example, column names with "name" in them).
  • At the data level — looks at the data itself using a sampling algorithm, to see whether there is any sensitive data.

The Delphix Masking Engine then uses that profile information to generate the appropriate jobs that will mask the target database. The user defines the connections to the databases to profile and then uses the Delphix Masking Engine software to perform the Profiling. When the profiling is complete, the information is stored as profile metadata for Delphix Masking Engine processing in the locally hosted or network Delphix Masking Engine database.

Profiler Settings Tab

You can add regular expressions and profiler sets to the Profiler Settings. In addition to using the Masking settings to determine your inventory of what to mask, a Profiling job uses expressions to identify the data you are seeking. For more information about profiling, see Managing Jobs in the Delphix Masking Engine User Guide.

The Profiler displays Domains along with their Expression Text, Expression Name, and Expression Level.

Adding New Expressions

Expressions let you specify how you want to profile data by letting you determine the data to profile based on the criteria you enter in the expressions. For example, you can define an expression that looks for a name or partial name for a column and only profiles data in columns that match that name or partial name. The following table shows some sample expressions.

Sample Expressions

Expression

Column Description

Data Description

(i:ad(d

dress)_line1

ad(d

dress)1

city_ad(d

dress)

ad(d

dress)_city

address.p[^o].*)

Looks for addresses by searching through patterns in the column name


(.[\s]+b(ou)?l(e)?v(ar)?d[\s].*)

(.[\s]+st[.]?(reet)?[\s].*)

(.[\s]+ave[.]?(nue)?[\s].*)

(.[\s]+r(oa)?d[\s].*)

(.[\s]+l(a)?n(e)?[\s].*)

(.[\s]+cir(cle)?[\s].*)


Looks for address line information in data

(?i)(.[\s]*ap(ar)?t(ment)?[\s]+.)

(.[\s]*s(ui)?te[\s]+.)

(c(are)?[\s][\\\\\]?[/]?o(f)?[\s]+.)


Looks for address line 2 information in the data


For sample expressions and tools, see http://www.regular-expressions.info/ or perform an Internet search for "regular expressions".

This resource is only a suggestion, not an endorsement of the site.

To add an expression

  1. Click Add Expression at the top of the Profiler tab.
    • A new expression will be created in-line.
  2. Select a domain from the Domain dropdown.

    Only the default Delphix Masking Engine domains and the domains you have defined appear in this drop-down. If you need to add a domain, see _Masking Add New Domain_.

  3. Enter the following information for that domain:
    • Expression Name—The field name used to select this expression as part of a profiler set.
    • Expression Text—The regular expression used to identify the location of the sensitive data.
  4. Select an Expression Level for the domain:
    • Column Level—To identify sensitive data based on column names.
    • Data Level—To identify sensitive data based on data values, not column names.
  5. When you are finished, click Save.

To delete an expression

Click the Delete icon to the far right of the name.

Adding or Editing a Profiler Set

You can define Profiler Sets in Delphix Masking Engine. A profiler set is a grouping of expressions for a particular purpose. For instance, First Name, Last Name, Address, Credit Card, SSN, and Bank Account Number could constitute a Financial Profiler Set. For information about creating a profiling job, see Creating a New Profiling Job in the Delphix Masking Engine User Guide.

Masking Engine comes with two predefined profiler sets: Financial and Healthcare vertical. A Delphix Masking Engine administrator (a user with the appropriate role privileges) can create/add/update/delete these profiler sets.

If you do not choose a profiler set as part of the Profiler job, the Delphix Masking Engine profiles data based on all the expressions defined on the Profiler Settings page.
If you want to edit or add a profiler set, click Profiler Set at the top of the Profiler tab. The Profiler Set screen appears, listing the profiler sets along with their Purpose and Date Created.

To edit a Profiler Set

Click the Edit icon to the right of the Profiler Set name.

To delete a Profiler Set

Click the Delete icon to the right of the Profiler Set name.

To add a Profiler Set:

  1. Click Add Set.
    The Create Profile Set window appears.
  2. Enter a profile Set Name.
  3. Optionally, enter a Purpose for this profile set.
  4. Enter/select which Domains to include in this set.
  5. When you are finished, click Submit.

Practical Profiling Example

This section provides an example of how you might define the data you want to profile.

Starting on the Profiler Settings page, you might want to look for First Name. Specify a regular expression to specify how to look for it. If the expression is column-name specific, the Delphix Masking Engine will identify which column names match the pattern specified in the expression. If Masking Engine finds a match, it will tag it as a sensitive column. If an expression matches multiple columns in a table, the Delphix Masking Engine tags all the columns for which it finds a match, not just the first column in the table. However, if multiple expressions match one column, the Delphix Masking Engine tags the first match in that column.

Profiling data takes a sample against the column. (Data sampling does not apply to mainframe processing.) The Delphix Masking Engine does not look at all rows, but the first n (n being 10,000 rows, 100,000 rows, and so on). (The value of n is set in the kettle-profiling.properties file by the NO_OF_ROWS property.)

So, if you want to look for First Names across all of your databases, specify the following expression on the Profiler Settings page:

[Nn][Aa][Mm][Ee]

If the expression is at a data level, you can look for common names such as John and Mary:
(([Jj][Oo][Hh][Nn])|[Mm][Aa][Rr][Yy]))

This expression looks for the names John and Mary in the database. If Masking Engine finds any, it identifies that as a First Name column.
You can also search based on format. For instance you can look for a social security number by looking for nine digits of data, with two hyphens (at positions 4,1 and 7,1): ^\d{3}\d{2}\d{4}$

Related Topics