Made Yourself Data Masking

This article describes how to create a VIEW for viewing depersonalized data. The solution described here is based on a solution from this article (getting a random string out of the table).

The main purpose of data masking is to obfuscate the real data and make it unrecoverable. But it is not enough just to hide the real data. Very often it is necessary to make it looks as realistic as possible.

Such requirements emerge because data masking is used mostly for application testing and the data should look like as realistic as it possible. And a good solution for this is to use the data from the actual table but take it from random rows.

Let’s begin.

I will use a table named “connections” for an example. This table includes “ID” and “client_port” columns which should be masked. And the “ID” column is the table’s primary key.

Since some rows could be deleted and ID contains not a strictly consistent value, let’s create a table with data linked with row number. Essentially, it is the quickest way for PostgreSQL to select data by the row number. If you’re using Oracle database, you can skip this step.

create table client_port_ids
	    rowid serial PRIMARY KEY,
	    id integer
	-- filling the table with existing id numbers. table should be filled before masking
	INSERT INTO client_port_ids (id ) SELECT id FROM connections ORDER BY id;
Since you would like the database to show the same values at the masked row at every SELECT query, it is necessary to create a table to store the link between the real data and substitute.
create table client_port_map
	    src integer PRIMARY KEY,
	     dst integer
Let’s create a masking function to test if the masked data haven’t fetched before. And if this data is absent, the function takes the data from a random row.
CREATE OR REPLACE FUNCTION public.hide_client_port(
	val integer)
	RETURNS integer AS
	res integer;
	sed float;
	row_count integer;
	rand_row integer;
	--check existing mapping
	SELECT dst into res FROM client_port_map WHERE src = val;
	--search random string
	select MAX(rowid) into row_count from client_port_ids;
	SELECT floor(random()*row_count) into rand_row;
	select client_port into res from connections where id = (select id from client_port_ids where rowid = rand_row);
	--saving new value to mapping
	INSERT INTO client_port