What is Data Masking?
Data masking, sometimes called data obfuscation is the process of hiding original data using modified content. The main reason why data masking is used is to hide sensitive data (personal data) stored in proprietary databases. However, when masking data one shouldn’t forget that this data has to remain usable for other corporate activities, for example, for testing and (further) application development.
Data masking is a very useful tool when:
- a company needs to give access to its database(s) to outsource and third-party IT companies. When you are masking data, it’s very important to make it look and appear consistent so that hackers and other malicious actors think that they’re dealing with genuine data.
- a company needs to mitigate operators’ errors. Companies usually trust their employees to make good and secure decisions, however many breaches are a result of operators’ errors. If data is masked, the results of such errors are not so catastrophic. Also, it’s worth mentioning that not all operations in databases need the use of entirely real, accurate data.
Data masking can be useful for all companies dealing with the following types of data:
- Personally identifiable information (PII)
- Protected health information (PHI)
- Payment card information (PCI-DSS)
- Intellectual property (ITAR)
All this data has to be protected in compliance with the national and international sensitive data protection regulations.
Examples of Masked Data
In the example below you can see how the Card column looked before masking:
SQL> select * from scott.emp; EMPNO ENAME JOB MGR HIREDATE CARD --------- --------- ---------- ------- --------- ------------------- 1 SMITH CLERK 0 17-DEC-80 4024-0071-8423-6700 2 SCOTT SALESMAN 0 20-FEB-01 4485-4392-7160-9980 3 JONES ANALYST 0 08-JUN-95 6011-0551-9875-8094 4 ADAMS MANAGER 1 23-MAY-87 5340-8760-4225-7182 4 rows selected.
And after masking:
SQL> select * from scott.emp; EMPNO ENAME JOB MGR HIREDATE CARD --------- --------- ---------- ------- --------- ------------------- 1 SMITH CLERK 0 17-DEC-80 XXXX-XXXX-XXXX-6700 2 SCOTT SALESMAN 0 20-FEB-01 XXXX-XXXX-XXXX-9980 3 JONES ANALYST 0 08-JUN-95 XXXX-XXXX-XXXX-8094 4 ADAMS MANAGER 1 23-MAY-87 XXXX-XXXX-XXXX-7182 4 rows selected.
You can mask different fields using different methods. With DataSunrise you can use, for example, the following methods:
- Credit card masking
- Email masking
- URL masking
- Phone numbers masking
- Masking by empty value
- Masking by fixed and random values
- Masking using a custom function
- Mask first and last chars of strings
- Masking any sensitive data in a plain text
- Masking by values from predefined dictionaries like Cities, Job Positions, American and Turkish First Names, the USA Company Names, the USA States Names, etc.
Steps of Data Masking
When it comes to practical data masking, you need the best strategy that works for data masking within your organization. Below are the steps you need to take to make data masking effective:
- Find your sensitive data. The first step is to recover and identify data that may be sensitive and require protection. It’s better to use a special automated software tool for that, like DataSunrise sensitive data discovery with using of table relations.
- Analyze the situation. At this stage the data security team should understand where the sensitive data is, who needs access to it and who doesn’t. You can use role-based access. Everyone who has a certain role can see an original or masked sensitive data.
- Apply masking. One should bear in mind that in very large organizations, it isn’t feasible to assume that just a single masking tool can be used across the entire company. Instead, you might need different data masking types.
- Test data masking results. This is the final step in the data masking process. Quality assurance and testing are required to ensure that the data masking configurations give the required results.
Data Masking Types
Dynamic Data Masking
Dynamic Data Masking is a process of masking data at the moment a query to a database with real private data is made. It is done through modifying the query or the response. At this data is masked on the fly, that is, without saving it to a transitional data storage.
Static Data Masking
As the name suggests, when masking data statically database administrators need to create a copy of the original data and keep it somewhere safe and replace it with a fake set of data. That is the content of a database is duplicated into a test environment and can be shared around third-party contractors and others. As a result, original sensitive data needing protection stays in the production database and a masked copy is moved into the test environment. However perfect it may seem to work with third-party contractors using static data masking, for applications needing real data from production databases statically masked data may be a big problem.
In-Place Data Masking
In-place data masking like static masking is also purposed for creating test data based on real production data. This process usually consists of 3 main steps:
- Copying production data as is to a test database.
- Removing redundant test data to decrease data storage volume and speed up testing processes.
- Replacing all PII data in a test database with masked values – this step is called in-place masking.
The way of copying of production data is left out of scope of in-place data masking itself. For example, it can be an ETL procedure or backup-recovery of a production database or something else. The most important thing here is that in-place masking is applied to a copy of a production database to mask the PII data it contains.
Conditions Data Masking Should Meet
As it was mentioned earlier any data involved in data masking has to remain meaningful at several levels:
- The data has to remain meaningful and valid for the application logic.
- The data must undergo enough changes so that it can’t be reverse-engineered.
- The obfuscated data may be required to be consistent across multiple databases within an organization when the databases each contain the specific data element being masked.
Data Masking Techniques
The following techniques may be used for masking (obfuscating) data:
- Substitution is one of the most popular and effective method for data masking. When applying this method real data is substituted with fake but still authentic-looking data. The substitution method is usually applied to phone numbers, zip codes, credit card numbers, Social Security and Medicare numbers, etс. When applying substitution to names, real-life names can be randomly substituted from a supplied or customized lookup file.
- Shuffling is another very popular way of masking data. It is very similar to the substitution method mentioned above with the only exception that the substitution set needed for substitution is taken from the same column of data that is being masked. To put it simply, the data is randomly shuffled within the column.
- Encryption is one of the most complex methods of data obfuscation. A special encryption mechanism requires using a “key” to view data based on user rights and privileges. Moreover, if you want to see the actual database content, you need to access the database through the DataSunrise Database Proxy.
- Nulling values out or deleting them. Just applying a null value to a particular field may look like a very simple yet efficient way to mask data. However, this approach is only useful to prevent direct visibility of data. But in most cases, it is not as good and effective as it may seem because this way of data masking will fail the logic of most applications.
- Number and date variance. If you do it right, number and date variance can give you a useful set of data without disclosing important financial information or transaction details. Let’s imagine you need to mask your employees’ salary numbers. To ensure accuracy of the salary range between highest and lowest paid employees when masked you can apply the same variance to all salaries in the set, that way the range doesn’t change.
- Character scrambling. It’s a very simple technique after using which characters are jumbled into a random order so that the original content is hidden. For example, using this technique you can change an employee’s ID #244536 in a production database to read #642345 for everyone not allowed to see the real data.
Static data masking, dynamic data masking, (including the possibility to mask XML, JSON, CSV files, and Unstructured Text on Amazon S3), and sensitive data discovery powered by table relations are comprised in DataSunrise Database Security Suite, so you can choose the most suitable solution for your company. But this is guaranteed, your data will be totally masked!