Basic Principles of Dynamic Masking

Basic Principles of Dynamic Masking

1. Introduction

This document describes how DataSunrise masks sensitive data on-the-fly and how masking is performed for SQL and NoSQL database types.

2. Masking

2.1 Ways of Masking

Let’s start from the beginning and get familiar with two basic principles of data masking in the database:

  1. Masking method based on query result sets modification
  2. Masking method based on SQL query modification

Let’s take a look at the advantages and disadvantages of both masking methods. To do this we have prepared the demo table:

Figure 1. Demo Table

Using this table, we will consider the process of data masking.

2.2 Masking Method Based on Query Result Sets Modification

Using this approach, the database receives the original query and returns the data as it usually happens:

Everything looks simple and logical enough. Now let’s take a look at the details of this approach:

 
  • How to process data returned by the function? E.g. case:
    select anyFunction(email) from customers 
    Various string modification methods such as concatenation or casting a string to lower/upper case, getting a substring, etc. are also related to this case.
  • There is no way to obfuscate data that was not returned to the client application. For example:
    create table demotest.democust as select * from
    or
    insert into demotest.democust select * from demotest.customers
    in these queries, source data will be saved to the created tables which causes a data leak.

Widely spread client applications rarely use isolated queries. Usually, queries are executed one after another and the result set of the first query is used in a second query and so on. If we consider such a case when the key field is masked, then for the application to keep working correctly, a query modification becomes necessary. For example, an application receives a list of masked email addresses which is used as a condition to fetch data:

  • select * from demotest.customers where
  • select * from demotest.customers where email=? — ? stands for

Since the table does not contain such email value a client application logic will be broken. Because a client application will be waiting for the information by the key-value which a client application received previously.

As a conclusion, you can see from the described situations, that it is a complicated (and sometimes even impossible) objective to provide the data masking method that will not affect a working process of a client application.

2.3 Masking Method Based on SQL Query Modification

In this approach, the text of the SQL query is modified. Special constructs are built into the query that prevent the extraction of sensitive data from the database: In the simple case, the conversion looks like this:
select email from customers -> select ‘masked’ from customers
Therefore, sensitive data that should be masked does not leave the database. It also solves all the main problems of data obfuscation:
  • Already obfuscated data is provided to functions:
    select anyFunction(‘masked’) from customers
    select anyFunction(maskFunction(email)) from customers
    The same way of data obfuscation is possible for operations that are performed internally in the database:
create table customers2 as select maskFunction(email) from
  • There is also a possibility to mask the columns that were specified not only in the select part of the query but also in the conditional part of the query e.g. where/join. Thus, the client application working logic will not be affected.

3. DataSunrise Dynamic Data Masking Principles

3.1 Features of Dynamic Data Masking

DataSunrise uses both approaches for different types of DBMS. DataSunrise modifies an SQL query itself where it is possible, and in those DBMSs that have quite a few language features (for example, NoSQL databases), result set modification is used.

DataSunrise allows to configure next parameters for Dynamic Masking:

  1. Columns that should be masked in a specific table.

  2. Masking algorithm<

  3. Filter Session

  4. Keep Row Count option

  5. Mask Select only option

  6. Block a Query When the Masked Column Related to a Data Modification option

Let’s go through the list and take a look at the configurable parameters and options one by one that can be specified when configuring a Dynamic Masking rule:

  1. Columns that should be masked in a specific table.

It’s simple, this parameter is responsible for columns that will be masked. 

  1. Masking algorithm.

This is the algorithm that will be used on column masking. The chosen algorithm directly affects the value will be returned on trying to fetch masked columns values.

Here is the example list of masking algorithms might be used with DataSunrise (to get informed about other masking algorithms please refer to our User Guide p. 148 where you can find detailed clarification on those):

  • Credit Card Number – algorithm is aimed to mask credit card numbers. It displays the last four digits of a credit card number, other characters are replaced with “X”. For example: XXXXXXXX-XXXX-1234.

  • E-mail Masking – the user name and domain section of email addresses are replaced with “*”, except the first one and the last one in a row. For example: a***@**.**m. 

  • Fixed String – STRING-type values are replaced with a user-predefined string.

  • etc. 

DataSunrise Security Suite also supports custom functions (functions specified in the database) to be used as a masking method and LUA scripts that are created right in DataSunrise WebUI environment.

  1. Filter Session.

This parameter can be specified to configure a Dynamic Masking rule behavior according to different conditions. Here is the condition parameters list:

  • Application – name of the client application

  • Application User – name of the application user

  • DB User – name of the DB user

  • Host – IP address or alias from where the user or application is connecting to the database

  • OS User – name of the Operating System user

  • Proxy – DataSunrise proxy where the rule should work in case many proxies were configured

For example, you want to mask data for the DB Guest User, and you do not want to mask data for DB Admin User which is understandable. In this case you just specify DB User Equals Guest and the masking rule will be triggered only when Guest User is performing queries to masked columns of the table.

  1. Keep Row Count option.

This is the checkbox that can be activated when configuring Dynamic Masking Rule. This checkbox is responsible for masking of values used in where, join etc. clauses of the queries. If this checkbox is checked, the masking of values used in the mentioned clauses will be disabled. In case this checkbox is not checked, values of the fields that should be masked will be masked even if they are used in the WHERE/JOIN clause.

  1. Mask SELECT only option.

This is also a checkbox. This parameter is responsible for rule to mask values of specified columns ONLY if the data is fetched by SELECT query type.

  1. Block a Query When the Masked Column Related to a Data Modification option.

The goal of this parameter is explained by its name. It is also a checkbox that is checked by default when creating a Dynamic Masking rule. In case there is a query goes to the DataSunrise proxy and a query is aimed to change the value in the masked column such query will be blocked.

3.2 Dynamic Masking options description

In this section we will provide screenshot examples of how these parameters affect the working process with the database when the Dynamic Masking rule is active.

Let’s imagine the situation where we want someone who is working with the database and customers table not to see the credit card data contained in the table.

To perform such a scenario, we will be using our test table that contains different information about customers:

As you can see on the screenshot, we have 8 columns. According to our scenario “card” column should be masked. To do so, Dynamic Masking rule will be created. In the dynamic masking rule configuration, it will be set to mask Card column using built-in credit card masking method. Let’s take a quick look at the rule’s settings. There are sections of the rule:

  1. Main section. Name of the rule, database type and database itself are specified here.
  2. Action section. This section contains options that were mentioned in the list. We will be changing these options along with going through the scenario and trace how changes will affect the results and performance of the queries to customers table. In the beginning, all options will be left as default.

  3. Flter session section and masking settings. If there are any filters specified, the rule will be triggered only if the condition met the requirements. In any other case rule will not be triggered. Since we have not specified any filters the rule will always work, on any query targeted to the customers table.

    Also, Credit card masking method is chosen for the column of the customers table.

    Now, when the rule is configured, let’s try to perform the same query to select all columns from the customers table:

    As you can see on the screenshot, the Credit card number column was masked according to the rule’s configuration.

3.3 Dynamic Masking options overview and usage examples

Data Sunrise does mask not only the column that is specified in select statement, but all occurrences of this column in the query e.g. WHERE, JOIN, GROUP BY etc. It also affects filtering by data.

To understand the situation deeper, let’s assume that I know the part of a card number and it is 4024. We have two customers whose credit card number contains this number:

  1. Kathy Abrams 4024-0071-8423-6700 

  2. Reggy Mackie  4024-0071-2955-5933

Now since I know the source data I will try to cheat and select those whose credit card number contains 4024 number:

As you can see, none of the rows were selected since this data is masked. However, we are still able to use unmasked part of the credit number in WHERE clause part of the query:

As you can see, we still can operate with unmasked part of the masked column.

These are the basics to know while working with Dynamic Masking rules but let’s take a look at the options that were described:

1. KEEP ROW COUNT

As it was previously described, this option is responsible for returning the same count of result set rows as it would be returned if no masking rule was applied. This checkbox is used when a client application query returns different from original data row count on different queries. This usually happens due to masked data usage in where/join clauses of the query when the client application automatically prepares different queries. At the basics part of the rule demonstration we could see that on this execution:

SELECT order_id, first_name, last_name, address, state, zip, email, card

FROM demo.customers

where card like ‘%4024%’;

The query will not return any results since results are masked and the application cannot find such fields.

But let’s see what happens if the KEEP ROW COUNT checkbox is checked:

As you can see here, even though results are masked, we can still use 4024 to find cards that contain such part, but we cannot see it in the result set. This option helps to receive the same row count even if data is masked because its real values can be used in WHERE/JOIN clauses of the query.

2. Mask SELECT only.

This option is responsible for masking only those columns that are specified after SELECT statement of the query. Let’s take a look at how it works:

As you can see here, card values were fetched using SELECT statement and according to the rule, the Card number column is masked.

Now let’s try to perform the next query and see what happens:

update demotest.democust set state = card;

As you can see, ‘state’ column now has real values of the masked ‘card’ column, because it was not SELECT query type performed. However, if a SELECT query type will be executed for the masked column it will still be masked.

3.  Block a Query When the Masked Column Related to a Data Modification option.

Not much to explain here, let’s try to perform a query which will be aimed to modify a card number using the next query:

update customers set card = ‘1234’ where card

According to the query text, the first row card value should be replaced with ‘1234’ value:

As the result it’s been returned ‘The query is blocked’.

4. Conclusion

Dynamic Masking feature is one of the most usable features of our product. This feature helps to mask sensitive data of the protected database according to the rule configuration the user has set.

DataSunrise uses the most appropriate approaches of how to mask sensitive data by choosing the best solution as for SQL based databases as for NoSQL databases.

Dynamic Masking rules have many of options can be configured by a user. The most significant of those are:

  • Keep Row Count
  • Mask SELECT only
  • Block a Query When the Masked Column Related to a Data Modification option

Keep Row Count option is responsible for returning the same count of rows returned by a query as it would be returned if there was no masking rule configured. If this option is active, conditional part of a query which is used for filtering data by using where/join/etc. clauses will not be masked. This option is important for those who use client applications and don’t want to see different count of rows returned by a query when masking rule is active.

Mask SELECT only option is responsible for masking only select parts of a query. If this option is active, only those columns that stand after SELECT statement and are specified in the rule will be masked. This option is also important for those who use client applications that perform automatically generated queries for e.g. UPDATE TABLE statements. Because all operations with the masked table data will not be affected in any way but will be seen masked if data will be selected.

Block a Query When the Masked Column Related to a Data Modification option is responsible for blocking any operation that affects the masked column in any way. If a masked column is mentioned in any part of the query itself, the query will be blocked. This option is important for those who don’t want to lose sensitive data by accident or by intent.

Download free 30 days Trial