Replacing sensitive information with fake values in order to protect actual data is referred as Data Masking. In simple words, data masking is a method of confusing the intruder by hiding actual data with a protective layer of real-looking useless data.
Many people confuse data masking with data access restriction, but, it is an entirely different concept. Access restriction method prevents the data to be seen by users, but users clearly realize that data is hidden. Data masking, in turn, supposed to provide users with fake data of some kind.
Why Data masking is Important?
Data leak, or inappropriate exposure of sensitive information, can affect a company on multiple levels.Legally: Each organization is responsible for its clients’ private data. If the company loses it anyhow, then any client can take legal action against that company.
Defamation: Public exposure of production or private data, contained in a company’s database, may cause company defamation.Loss of Future Prospects: If your competitors get your company’s information they learn your future prospects and act to beat the competition. Or your competitor can mould the information to use it against you.
What is Data masking used for?If your production database contains real sensitive info, it doesn’t mean that databases intended for testing purposes should contain it as well. To control data exposure limits various data masking routines are used.
Level-I Masking or Compound MaskingThe set of relative columns is masked as a group so as the masked data retain the same relationship across the columns. For instance, ZIP, city and state entries need to be consistent after masking applied.
Level-II Masking or Deterministic MaskingLevel-II Masking is used to ensure that certain values get masked to the same value across all databases. For instance: a customer number or I.D.
Level-III Masking or Lock-Key MaskingWhen a company has to send its data to another company or any third party for reporting, analysis or any other business process, then Lock-Key masking is used. Original data is masked using a secure lock-key masking function. Once the company gets the data back from the 3rd party, it can recover the original data by using the same key that was used to mask it. It is also called Key-based reversible masking.
Data masking techniquesSubstitution: Database content is being randomly replaced with something similar but not exactly the same. For example, it means replacing real surnames with surnames picked from a random list.
Shuffling: In substitution, the replacement data is fetched from outer source whereas in shuffling the replacement data is taken from the column i