Basic Principles of Dynamic Masking
Data Masking Principles
Let’s start from the beginning and get familiar with two basic principles of data masking in the database:
- Masking method based on query result sets modification
- Masking method based on SQL query modification
DataSunrise uses both approaches for different types of DBMS. DataSunrise modifies an SQL query itself where it is possible, and in those DBMSs that have quite a few language features (for example, NoSQL databases), result set modification is used.
Let’s take a look at the advantages and disadvantages of both masking methods.
Masking Method Based on Query Result Sets Modification
Using this approach, the database receives the original query and returns the data as it usually happens:
Everything looks simple and logical enough. Now let’s take a look at the details of this approach:
- How to process data returned by the function? E.g. case:
SELECT anyFunction(email) FROM customers;Various string modification methods such as concatenation or casting a string to lower/upper case, getting a substring, etc. are also related to this case.
- There is no way to obfuscate data that was not returned to the client application. For example:
CREATE TABLE demotest.democust AS SELECT * FROM demotest.customers;or
INSERT INTO demotest.democust SELECT * FROM demotest.customers;in these queries, source data will be saved to the created tables which causes a data leak.
Widely spread client applications rarely use isolated queries. Usually, queries are executed one after another and the result set of the first query is used in a second query and so on. If we consider such a case when the key field is masked, then for the application to keep working correctly, a query modification becomes necessary. For example, an application receives a list of masked email addresses which is used as a condition to fetch data:
SELECT * FROM demotest.customers WHERE email=’firstname.lastname@example.org’;
SELECT * FROM demotest.customers WHERE email=?; -- ? stands for bound variable
Since the table does not contain such email value a client application logic will be broken. Because a client application will be waiting for the information by the key-value which a client application received previously.
As a conclusion, you can see from the described situations, that it is a complicated (and sometimes even impossible) objective to provide the data masking method that will not affect a working process of a client application.
Masking Method Based on SQL Query ModificationIn this approach, the text of the SQL query is modified. Special constructs are built into the query that prevent the extraction of sensitive data from the database: In the simple case, the conversion looks like this:
SELECT email FROM customers -> SELECT ‘masked’ FROM customers;Therefore, sensitive data that should be masked does not leave the database. It also solves all the main problems of data obfuscation:
- Already obfuscated data is provided to functions:
SELECT anyFunction(‘masked’) FROM customers;
SELECT anyFunction(maskFunction(email)) FROM customers;The same way of data obfuscation is possible for operations that are performed internally in the database:
CREATE TABLE customers2 AS SELECT maskFunction(email) FROM customers;
- There is also a possibility to mask the columns that were specified not only in the select part of the query but also in the conditional part of the query e.g. where/join. Thus, the client application working logic will not be affected.
Main Sections of Dynamic Data Masking Rule
In this section we will provide screenshot examples of how these parameters affect the working process with the database when the dynamic masking rule is active.
Let’s imagine the situation where we want someone who is working with the database and customers table not to see the credit card data contained in the table.
To perform such a scenario, we will be using our test table that contains different information about customers:
SQL> SELECT order_id, first_name, last_name, address, state, zip, email, card FROM demotest.democust; ORDER_ID FIRST_NAME LAST_NAME ADDRESS STATE ZIP EMAIL CARD --------- ---------- --------- --------------------- ----- -------- ------------------- ------------------- 1,121 Kathy Abrams High Street, Wheeling MO 132068 email@example.com 4024-0071-8423-6700 4,667 Alma Wade Green Lane, Newport NE 21771 firstname.lastname@example.org 6011-0551-9875-8094 3,427 Mary Evans 13 The Limes, Leighton OR 10950 email@example.com 4024-4392-7160-9980 3 rows selected.
As you can see, we have 8 columns. According to our scenario “card” column should be masked. To do so, Dynamic Masking rule will be created. In the dynamic masking rule configuration, it will be set to mask Card column using built-in credit card masking method. Let’s take a quick look at the rule’s settings. There are sections of the rule:
- Main section. Name of the rule, database type and database itself are specified here.
Action section. This section contains options that were mentioned in the list. We will be changing these options along with going through the scenario and trace how changes will affect the results and performance of the queries to customers table. In the beginning, all options will be left as default.
- Filter session section and masking settings. If there are any filters specified, the rule will be triggered only if the condition met the requirements. In any other case rule will not be triggered. Since we have not specified any filters the rule will always work, on any query targeted to the customers table.
Also, Credit card masking method is chosen for the column of the customers table.
Now, when the rule is configured, let’s try to perform the same query to select all columns from the customers table:
SQL> SELECT order_id, first_name, last_name, address, state, zip, email, card FROM demotest.democust; ORDER_ID FIRST_NAME LAST_NAME ADDRESS STATE ZIP EMAIL CARD --------- ---------- --------- --------------------- ----- -------- ------------------- ------------------- 1,121 Kathy Abrams High Street, Wheeling MO 132068 firstname.lastname@example.org XXXX-XXXX-XXXX-6700 4,667 Alma Wade Green Lane, Newport NE 21771 email@example.com XXXX-XXXX-XXXX-8094 3,427 Mary Evans 13 The Limes, Leighton OR 10950 firstname.lastname@example.org XXXX-XXXX-XXXX-9980 3 rows selected.
As you can see on the screenshot, the Card column was masked according to the rule’s configuration.
Main Options of Dynamic Data Masking Rule
Columns for Masking
It is simple, this parameter is responsible for columns that will be masked.
This is the algorithm that will be used on column masking. The chosen algorithm directly affects the value will be returned on trying to fetch masked columns values.
Here is the example list of masking algorithms might be used with DataSunrise (to get informed about other masking algorithms please refer to our User Guide p. 148 where you can find detailed clarification on those):
Credit Card Number – algorithm is aimed to mask credit card numbers. It displays the last four digits of a credit card number, other characters are replaced with “X”. For example: XXXXXXXX-XXXX-1234.
E-mail Masking – the user name and domain section of email addresses are replaced with “*”, except the first one and the last one in a row. For example: a***@**.**m.
Fixed String – STRING-type values are replaced with a user-predefined string.
DataSunrise Security Suite also supports custom functions (functions specified in the database) to be used as a masking method and LUA scripts that are created right in DataSunrise WebUI environment.
This parameter can be specified to configure a Dynamic Masking rule behavior according to different conditions. Here is the condition parameters list:
Application – name of the client application
Application User – name of the application user
DB User – name of the DB user
Host – IP address or alias from where the user or application is connecting to the database
OS User – name of the Operating System user
Proxy – DataSunrise proxy where the rule should work in case many proxies were configured
For example, you want to mask data for the DB Guest User, and you do not want to mask data for DB Admin User which is understandable. In this case you just specify DB User Equals Guest and the masking rule will be triggered only when Guest User is performing queries to masked columns of the table.
Keep Row Count
This option is responsible for masking of values used in where, join etc. clauses of the queries:
when disabled, values of the fields that should be masked will be masked even if they are used in the WHERE/JOIN clause.
when enabled, the masking of values used in the mentioned clauses will be disabled. Returned result set will contain the same count of rows as it would be returned if no masking rule was applied. This option helps to receive the same row count even if data is masked because its real values can be used in WHERE/JOIN clauses of the query.
To understand the situation deeper, let’s assume that I know the part of a card number and it is 4024. We have two customers whose credit card number contains this number:
Kathy Abrams 4024-0071-8423-6700
Mary Evans 4024-0071-2955-9980
Let’s see what happens if the option is disabled:
SQL> SELECT order_id, first_name, last_name, address, state, zip, email, card
WHERE card LIKE '%4024%';
no rows selected.
The query will not return any results since results are masked and the application cannot find such fields.
Let’s see what happens if the option is enabled:
SQL> SELECT order_id, first_name, last_name, address, state, zip, email, card FROM demotest.democust WHERE card LIKE '%4024%'; ORDER_ID FIRST_NAME LAST_NAME ADDRESS STATE ZIP EMAIL CARD --------- ---------- --------- --------------------- ----- -------- ------------------- ------------------- 1,121 Kathy Abrams High Street, Wheeling MO 132068 email@example.com XXXX-XXXX-XXXX-6700 3,427 Mary Evans 13 The Limes, Leighton OR 10950 firstname.lastname@example.org XXXX-XXXX-XXXX-9980 2 rows selected.
As you can see here, even though results are masked, we can still use 4024 to find cards that contain such part, but we cannot see it in the result set.
Mask SELECTs only
This parameter is responsible for rule to mask values of specified columns ONLY if the data is fetched by SELECT query type. Let’s take a look at how it works:
SQL> SELECT card FROM demotest.democust; CARD ------------------- XXXX-XXXX-XXXX-6700 XXXX-XXXX-XXXX-8094 XXXX-XXXX-XXXX-9980 3 rows selected.
As you can see here, card values were fetched using SELECT statement and according to the rule, the Card number column is masked.
Now let’s try to perform the next query and see what happens:
SQL> UPDATE demotest.democust SET state = card; 3 rows updated. SQL> SELECT order_id, first_name, last_name, state, zip, card FROM demotest.democust; ORDER_ID FIRST_NAME LAST_NAME STATE ZIP CARD --------- ---------- --------- ------------------- -------- ------------------- 1,121 Kathy Abrams 4024-0071-8423-6700 132068 XXXX-XXXX-XXXX-6700 4,667 Alma Wade 6011-0551-9875-8094 21771 XXXX-XXXX-XXXX-8094 3,427 Mary Evans 4485-4392-7160-9980 10950 XXXX-XXXX-XXXX-9980 3 rows selected.
As you can see, “state” column now has real values of the masked “card” column, because it was not SELECT query type performed. However, if a SELECT query type will be executed for the masked column it will still be masked.
Block a Query When the Masked Column Related to a Data Modification
The goal of this parameter is explained by its name. It is also a checkbox that is checked by default when creating a Dynamic Masking rule. In case there is a query goes to the DataSunrise proxy and a query is aimed to change the value in the masked column such query will be blocked. Let’s try to perform a query which will be aimed to modify a card number using the next query:
SQL> UPDATE demotest.democust SET card = '1234' WHERE card LIKE '%6700%'; ERROR at line 1: SQL Error : The query is blocked
As the result it is been returned “The query is blocked”.
Dynamic Masking feature is one of the most usable features of our product. This feature helps to mask sensitive data of the protected database according to the rule configuration the user has set.
DataSunrise uses the most appropriate approaches of how to mask sensitive data by choosing the best solution as for SQL based databases as for NoSQL databases.
Dynamic Masking rules have many of options can be configured by a user. The most significant of those are:
- Keep Row Count option is responsible for returning the same count of rows returned by a query as it would be returned if there was no masking rule configured. If this option is active, conditional part of a query which is used for filtering data by using where/join/etc. clauses will not be masked. This option is important for those who use client applications and don’t want to see different count of rows returned by a query when masking rule is active.
- Mask SELECTs only option is responsible for masking only select parts of a query. If this option is active, only those columns that stand after SELECT statement and are specified in the rule will be masked. This option is also important for those who use client applications that perform automatically generated queries for e.g. UPDATE TABLE statements. Because all operations with the masked table data will not be affected in any way but will be seen masked if data will be selected.
- Block a Query When the Masked Column Related to a Data Modification option is responsible for blocking any operation that affects the masked column in any way. If a masked column is mentioned in any part of the query itself, the query will be blocked. This option is important for those who don’t want to lose sensitive data by accident or by intent.