What is Data Masking?
Data masking, sometimes called data obfuscation is the process of hiding original data using modified content. The main reason why data masking is used is to hide sensitive data (personal data) stored in proprietary databases. However, when masking data one shouldn’t forget that this data has to remain usable for other corporate activities, for example, for testing and (further) application development.
Data masking is a very useful tool when:
- a company needs to give access to its database(s) to outsource and third-party IT companies. When you are masking data, it’s very important to make it look and appear consistent so that hackers and other malicious actors think that they’re dealing with genuine data.
- a company needs to mitigate operators’ errors. Companies usually trust their employees to make good and secure decisions, however many breaches are a result of operators’ errors. If data is masked, the results of such errors are not so catastrophic. Also, it’s worth mentioning that not all operations in databases need the use of entirely real, accurate data.
Data masking can be useful for all companies dealing with the following types of data:
- Personally identifiable information (PII)
- Protected health information (PHI)
- Payment card information (PCI-DSS)
- Intellectual property (ITAR)
Steps of Data Masking
When it comes to practical data masking, you need the best strategy that works for data masking within your organization. Below are the steps you need to take to make data masking effective:
- Find your sensitive data. The first step is to recover and identify data that may be sensitive and require protection. It’s better to use a special automated software tool for that.
- Analyze the situation. At this stage the data security team should understand where the sensitive data is, who needs access to it and who doesn’t.
- Apply masking. One should bear in mind that in very large organizations, it isn’t feasible to assume that just a single masking tool can be used across the entire company. Instead, you might need different data masking types.
- Test data masking results. This is the final step in the data masking process. Quality assurance and testing are required to ensure that the data masking configurations give the required results.
As the name suggests, when masking data statically database administrators need to create a copy of the original data and keep it somewhere safe and replace it with a fake set of data. That is the content of a database is duplicated into a test environment and can be shared around third-party contractors and others. As a result, original sensitive data needing protection stays in the production database and a masked copy is moved into the test environment. However perfect it may seem to work with third-party contractors using static data masking, for applications needing real data from production databases statically masked data may be a big problem.
In the picture below you can see how the table containing sensitive data looks before Static Masking from DataSunrise.
And after Static Masking from DatSunrise:
As you can see the columns LastName, Address, and Card containing sensitive data have been masked statically using DataSunrise Static Masking tool.
A variation of static masking is called in-place masking. The peculiarity of this type of masking is that the database/schema/table to be masked is the target and source at the same time.
When masking data dynamically, data is obfuscated on the go as an unauthorized database user is trying to retrieve the data not intended for that user. Real-time masking also means that data never leaves the production database and, as a result, is less susceptible to security threats. Data is never exposed because the contents are jumbled in real-time.
In the picture below you can see how the Card column looked before Dynamic Masking from DataSunrise:
And after Dynamic Masking from DataSunrise:
You can mask different fields using different methods. With Datasunrise you can use the following methods:
- Credit card masking
- Email masking
- Email masking full
- Mask username of Email
- Empty value
- Fixed Number
- Fixed String
- Function call
- Mask first chars
- Mask last chars
- Mask URL
- Mask first and last chars
Both static and dynamic masking have their pro and con sides and teams responsible for database protection have to choose the most appropriate method of sensitive data protection. The advantages and disadvantages of each masking method with detailed instructions on how to mask data using DataSunrise Database Security Suite are described in the other articles in this data masking section.
Conditions Data Masking Should Meet
As it was mentioned earlier any data involved in data masking has to remain meaningful at several levels:
- The data has to remain meaningful and valid for the application logic.
- The data must undergo enough changes so that it can’t be reverse-engineered.
- The obfuscated data may be required to be consistent across multiple databases within an organization when the databases each contain the specific data element being masked.
Data Masking Techniques
The following techniques may be used for masking (obfuscating) data:
- Substitution is one of the most popular and effective method for data masking. When applying this method real data is substituted with fake but still authentic-looking data. The substitution method is usually applied to phone numbers, zip codes, credit card numbers, Social Security and Medicare numbers, etс. When applying substitution to names, real-life names can be randomly substituted from a supplied or customized lookup file.
- Shuffling is another very popular way of masking data. It is very similar to the substitution method mentioned above with the only exception that the substitution set needed for substitution is taken from the same column of data that is being masked. To put it simply, the data is randomly shuffled within the column.
- Encryption is one of the most complex methods of data obfuscation. A special encryption mechanism requires using a “key” to view data based on user rights and privileges.
- Nulling values out or deleting them. Just applying a null value to a particular field may look like a very simple yet efficient way to mask data. However, this approach is only useful to prevent direct visibility of data. But in most cases, it is not as good and effective as it may seem because this way of data masking will fail the logic of most applications.
- Number and date variance. If you do it right, number and date variance can give you a useful set of data without disclosing important financial information or transaction details. Let’s imagine you need to mask your employees’ salary numbers. To ensure accuracy of the salary range between highest and lowest paid employees when masked you can apply the same variance to all salaries in the set, that way the range doesn’t change.
- Character scrambling. It’s a very simple technique after using which characters are jumbled into a random order so that the original content is hidden. For example, using this technique you can change an employee’s ID #244536 in a production database to read #642345 for everyone not allowed to see the real data.
Static data masking, dynamic data masking, and sensitive data discovery tools are included in DataSunrise Database Security Suite, so you can choose the most suitable solution for your company. But this is guaranteed, your data will be totally masked!