The Masking Engine profiler uses two different methods to identify the location of sensitive data:
- At the metadata level—searches through the column names in the target database, by querying the database catalog, looking for specific words in column names (for example, column names with "name" in them).
- At the data level—looks at the data itself using a sampling algorithm, to see whether there is any sensitive data.
Masking Engine then uses that profile information to generate the appropriate jobs that will mask the target database. The user defines the connections to the databases to profile and then uses the Masking Engine software to perform the Profiling. When the profiling is complete, the information is stored as profile metadata for Masking Engine processing in the locally hosted or network Masking Engine database.
Profiler Settings Tab
You can add regular expressions and profiler sets to the Profiler Settings. In addition to using the Masking settings to determine your inventory of what to mask, a Profiling job uses expressions to identify the data you are seeking. For more information about profiling, see "About Profiling Data" in Masking Engine User's Guide.
The Profiler displays Domains along with their Expression Text, Expression Name, and Expression Level.
Adding New Expressions
Expressions let you specify how you want to profile data by letting you determine the data to profile based on the criteria you enter in the expressions. For example, you can define an expression that looks for a name or partial name for a column and only profiles data in columns that match that name or partial name. The following table shows some sample expressions.
Sample Expressions
Expression
Column Description
Data Description
(i:ad(d
dress)_line1
ad(d
dress)1
city_ad(d
dress)
ad(d
dress)_city
address.p[^o].*)
Looks for addresses by searching through patterns in the column name
(.[\s]+b(ou)?l(e)?v(ar)?d[\s].*)
(.[\s]+st[.]?(reet)?[\s].*)
(.[\s]+ave[.]?(nue)?[\s].*)
(.[\s]+r(oa)?d[\s].*)
(.[\s]+l(a)?n(e)?[\s].*)
(.[\s]+cir(cle)?[\s].*)
Looks for address line information in data
(?i)(.[\s]*ap(ar)?t(ment)?[\s]+.)
(.[\s]*s(ui)?te[\s]+.)
(c(are)?[\s][\\\\\]?[/]?o(f)?[\s]+.)
Looks for address line 2 information in the data
For sample expressions and tools, see http://www.regular-expressions.info/ or perform an Internet search for "regular expressions". (Disclaimer: We have provided this resource as a suggestion. Axis Technology does not endorse this or any other related site.)
To add an expression
- Click Add Expression at the top of the Profiler tab.
- A new expression will be created in-line.
Select a domain from the Domain dropdown.
Only the default Masking Engine domains and the domains you have defined appear in this dropdown. If you need to add a domain, see _Masking Add New Domain_.
- Enter the following information for that domain:
- Expression Name—The field name used to select this expression as part of a profiler set.
- Expression Text—The regular expression used to identify the location of the sensitive data.
- Select an Expression Level for the domain:
- Column Level—To identify sensitive data based on column names.
- Data Level—To identify sensitive data based on data values, not column names.
- When you are finished, click Save.
To delete an expression
Click the Delete icon to the far right of the name.
Adding or Editing a Profiler Set
You can define Profiler Sets in Masking Engine. A profiler set is a grouping of expressions for a particular purpose. For instance, First Name, Last Name, Address, Credit Card, SSN, and Bank Account Number could constitute a Financial Profiler Set. For information about creating a profiling job, see "Creating a New Profiling Job" in Masking Engine User's Guide.
Masking Engine comes with two predefined profiler sets: Financial and Healthcare vertical. A Masking Engine administrator (a user with the appropriate role privileges) can create/add/update/delete these profiler sets.
If you don't choose a profiler set as part of the Profiler job, Masking Engine profiles data based on all the expressions defined on the Profiler Settings page.
If you want to edit or add a profiler set, click Profiler Set at the top of the Profiler tab. The Profiler Set screen appears, listing the profiler sets along with their Purpose and Date Created.
To edit a profiler set
- Click the Edit icon to the right of the Profiler Set name.
To delete a profiler set
Click the Delete icon to the right of the Profiler Set name.
To add a profiler set:
- Click Add Set.
The Create Profile Set window appears. - Enter a profile Set Name.
- Optionally, enter a Purpose for this profile set.
- Enter/select which Domains to include in this set.
- When you are finished, click Submit.
Practical Profiling Example
This section provides an example of how you might define the data you want to profile.
Starting on the Profiler Settings page, you might want to look for First Name. Specify a regular expression to specify how to look for it. If the expression is column-name specific, Masking Engine will identify which column names match the pattern specified in the expression. If Masking Engine finds a match, it will tag it as a sensitive column. If an expression matches multiple columns in a table, Masking Engine tags all the columns for which it finds a match, not just the first column in the table. However, if multiple expressions match one column, Masking Engine tags the first match in that column.
Profiling data takes a sample against the column. (Data sampling does not apply to mainframe processing.) Masking Engine does not look at all rows, but the first n (n being 10,000 rows, 100,000 rows, and so on). (The value of n is set in the kettle-profiling.properties file by the NO_OF_ROWS property.)
So, if you want to look for First Names across all of your databases, specify the following expression on the Profiler Settings page:
[Nn][Aa][Mm][Ee]
If the expression is at a data level, you can look for common names such as John and Mary:
(([Jj][Oo][Hh][Nn])|[Mm][Aa][Rr][Yy]))
This expression looks for the names John and Mary in the database. If Masking Engine finds any, it identifies that as a First Name column.
You can also search based on format. For instance you can look for a social security number by looking for nine digits of data, with two hyphens (at positions 4,1 and 7,1): ^\d{3}\d{2}\d{4}$