Masking XML, JSON, CSV and Unstructured Text on Amazon S3

Masking XML, JSON, CSV and Unstructured Text on Amazon S3

Data in Clouds

We are living in a world where data is one of the most valuable assets. And the IT industry is constantly developing ways of storing this data in the most convenient way.

Storing data in clouds is one of the most popular ways of storing data. We all have heard and using such platforms as Amazon Web Services, Alibaba OSS, Minio, etc.

However, if data tends to be stored in clouds, hackers will be attacking these storages. Database owners may be thinking that their sensitive data is completely safe there. Let’s discuss, if it’s completely true.

Security is a shared responsibility between the cloud provider and the customer in the cloud: AWS manages the security of the cloud, and customers are responsible for managing security in the cloud.

However, there are types of documents that are hard to protect as the data inside is just a plain text if we talk about unstructured texts, CSV, XML and JSON files. DataSunrise allows you to control access to these files and mask its content if necessary.

Masking Possibilities

XML

XML has found an extremely wide application in numerous and various programs and devices to handle, structure, store, transmit and display data online. No wonder that that whatever we keep online using XML is extremely vulnerable to leaks and hacking.

Below you can see how an XML file protected by DataSunrise looks.

<people_test>
    <record>
        <id>1</id>
        <first_name>********</first_name>
        <last_name>*****</last_name>
        <email>tguess0@washington.edu</email>
        <gender>Male</gender>
        <ip_address>181.236.58.217</ip_address>
    </record>
    <record>
        <id>2</id>
        <first_name>*******</first_name>
        <last_name>******</last_name>
        <email>wculpan1@nature.com</email>
        <gender>Male</gender>
        <ip_address>201.187.144.70</ip_address>
    </record>
    <record>
        <id>3</id>
        <first_name>*******</first_name>
        <last_name>****</last_name>
        <email>klace2@etsy.com</email>
        <gender>Female</gender>
        <ip_address>113.21.227.26</ip_address>
    </record>
  </people_test>

As you can see, we have hidden sensitive data first name and last name. Using the XmlPath in DataSunrise in the tabular form you can specify the XML tags to be masked. To mask all data, leave the XmlPath field empty. After that you can choose the masking method and mask value.

JSON

JSON stands for JavaScript Object Notation. Nowadays it is a very popular way of exchanging data between a browser and server. The exchanged data can be only text. JSON can be also used for storing data, but in this case data is also stored in the text form. When masking JSON files using DataSunrise in the jsonPath field in the tabular form you can specify different attributes whose values to hide. If you leave the jsonPath field blank, then all values will be masked. As you can see below we have decided to mask data “first_name” and “last_name” values.

[
   {
      "id":1,
      "first_name":"masked",
      "last_name":"masked",
      "email":"lwankel0@time.com",
      "gender":"Male",
      "ip_address":"252.132.213.37",
      "date":"2019-08-24"
   },
   {
      "id":2,
      "first_name":"masked",
      "last_name":" masked",
      "email":"jhenrych1@ucoz.com",
      "gender":"Female",
      "ip_address":"184.85.69.129",
      "date":"2019-07-23"
   },
   {
      "id":3,
      "first_name":"masked",
      "last_name":"masked",
      "email":"aarthur2@google.fr",
      "gender":"Female",
      "ip_address":"16.195.117.101",
      "date":"2020-03-13"
   }
]

CSV

CSV is a special type of file with a special extension which saves data in a tabular format. One peculiarity of CSV files is that they are plain text. Below you can see how data looks in a masked CSV file. As you can see a lot of sensitive data has been masked: IDs, last names, e-mails and IP addresses. If you mask your CSV file using DataSunrise, you need to specify column numbers, then choose the masking method and mask value. In the picture below we are masking columns 1 (IDs), columns 3 (last name), columns 4 (e-mails) and column 6 (IP addresses).

id  first_name  last_name email  gender  ip_address
*   Gilfoyle    ********* *****  Female  **********
*   Chilcotte   ********* *****  Male    **********
*   Terrell     ********* *****  Male    **********
*   Pearle      ********* *****  Female  **********
*   Kits        ********* *****  Male    **********
*   McAlpine    ********* *****  Male    **********

Unstructurued Text

The unstructured text (data) doesn’t have a pre-defined data model or is not organized in a pre-defined manner. Unstructured data is usually text-heavy, but may contain dates, numbers and other sensitive data. Unstructured data lacks metadata and cannot readily be indexed or mapped. Below is an example how DatSunrise can mask an unstructured text. As you can see, sensitive data is masked. Data to be masked is taken from DataSunrise built-in dictionaries (Lexicon).

Procedure Findings. The patient, **************, is a ** year old male born on October *,
****. He has a * mm sessile polyp that was found in the ascending colon and removed by
snare, no cautery. *******'s address is ** *********. ************ *****.

His SSN is **********. He experienced the polyp after getting out of his blue
************ with a license number of  WDR-***. We were able to control the bleeding.
Moderate diverticulosis and hemorrhoids were  incidentally noted.
Recurrent GI bleed of unknown etiology;
hypotension perhaps secondary to this but as likely secondary to polypharmacy.
He reports first experiencing hypotension while eating queso ***********.
 

DataSunrise Masking Rule for AWS S3

To mask data dynamically using DataSunrise you need to create a database instance, that is to specify what database you want to protect. In the picture below you can see a list of database instances. An AWS S3 database is on that list. Click Add New if you want to create a new database instance.

List of database instances

To set up a masking rule you need to go to the Masking section of the UI and select Add Rule

List of masking rules

Specify all necessary information about a new rule in the window that pops up and scroll down to the bottom of the page.

Configuring main section of masking rule

In the Masking Settings section you can choose what type of document you want to mask. It can be either CSV, XML, JSON or unstructured text.

Choose type of text document in the masking rule

Then depending on your needs, tick the type of document you want protected in your S3 bucket. This article will guide you through 4 types of documents available and the first is CSV files.

XML

In the picture below we want to protect an XML file and put a tick near this type of files.After that you need to specify the full file name in our S3 bucket in the format shown below.

Configuring rule o mask XML file

CSV

In the picture below we want to protect a CSV file and put a tick near this type of files. After that cliсk “Add File” and specify a CSV file in our S3 bucket we want protected.

Configuring rule o mask CSV file

Now scroll down specify the masking method and masking value (asterisk in the picture). After that click Save Rule to save and activate the new rule.

Configuring masking method to mask values inside CSV file

JSON

If you want to protect JSON, you need to choose this option and specify the full file name in a format shown below. Click Save Rule to activate the rule.

Configuring rule o mask JSON file

Unstructured Text

If you want to mask an unstructured text file, choose this option and enter the full file name in the format as shown in the picture below and click Save Rule to save and activate the rule.

Configuring rule o mask unstructured text file

Conclusion

DataSunrise Database Security Suite is a very powerful tool to protect your data both on-prem and in the cloud. Now you have a unique opportunity to download your trial version of DataSunrise and see how much it can do to make your sensitive data protected inside XML, JSON, CSV files and unstructured texts.

Download free trial