DataSunrise is sponsoring AWS re:Invent 2024 in Las Vegas, please visit us in DataSunrise's booth #2158

Information Type: Data-Inspired Security Basics

Information Type: Data-Inspired Security Basics

Introduction

DataSunrise features Data-Inspired security, offering unique and powerful capabilities for quick data discovery on every data request for a given data source. While this approach creates some execution time overhead, it provides extremely flexible database protection.

Information Types were first introduced in DataSunrise with the Sensitive Data Discovery feature, which scans databases and storage systems like S3 for sensitive data.

Information Types offer functionality beyond basic Data Discovery. They enable real-time data type detection and can automatically trigger protection or audit rules through Data-Inspired Security with each database query. Furthermore, they facilitate the labeling of audit trails, making it easier to track specific data access events in both Transactional Trails and log files.

Discovering Information Types

Data properties need to be stored in an entity for analysis. Sometimes this data has a strict structure with column names, table names, and types. Other times, it may appear as JSON, CSV files, plain text, or even scanned document images. DataSunrise enables searching all these objects for sensitive data.

This leads to flexible descriptions of Information Types. For example, email data can have several properties:

  • Column name contains “email”
  • Table name contains “email”
  • Data matches the regular expression “.*@.*

To be classified as email information, data must meet multiple requirements. This introduces another important DataSunrise entity called Information Attribute. An Information Type is essentially a collection of attributes that data must match to be considered a specific type.

When searching for sensitive data in a simple text file without columns and tables, the attribute set may be different. For instance, plain text email Information Type might only require regex matching, with no additional attributes needed.

Available Information Types in DataSunrise

DataSunrise includes numerous built-in Information Types, each associated with popular Security Standards (GDPR, HIPAA, SOX and other). While this association isn’t mandatory for custom types, it helps track activity and collect usage data for Compliance audits.

Users can create custom Information Types, ranging from simple to complex. The built-in email type, for example, includes attributes matching column names and complex pattern matching for email content.

DataSunrise offers multiple Date Information Types to accommodate various date formatting conventions. It’s worth noting that different Information Types can be used to identify the same type of data when it’s stored in different formats.

For simpler Information Type matching, we recommend creating custom types with more flexible attributes. In our practice, most of the users create a custom email data type that only requires data content to match a basic regular expression pattern like “@.*”.

Custom types allow you to both add new Information Types for detection and control how strictly these Information Types match your data patterns. You can create Information Types that require multiple Attributes, each containing column names and data patterns, or alternatively, create simpler Information Types with just a single Attribute that only checks the data pattern.

How to Create a Custom Information Type

Step 1 – Add New Information Type

  1. Navigate to Data Discovery – Information Types
  1. Press the “+ Add Information Type” button and enter a suitable name for your custom Information Type.
  1. After setting the Information Type name, it will appear in the list on the “Information Types” page. Click the new Information Type to edit its internal parameters and Attributes. Let’s use the ‘Custom Email Information Type’ as an example.

The Information Type Edit page contains three main subsections:

  • The Attributes section lets you set actual matching parameters for the Information Type. You can create one or more attributes, and each attribute can contain requirements for the database object, column name, and data pattern. The Information Type matches if any single attribute matches. For included attributes, all conditions (object, column name pattern, and data pattern) must be met if specified.
  • The Security Standards section allows you to link the Information Type to Security Standard (-s). This is used by the Compliance feature during discovery tasks, as the Compliance feature operates based on security standards.
  • The Manage Tags section helps you easily find rule log entries in logs or reports. You can create custom tags for this purpose.

Step 2 – Add Attribute to Information Type

  1. Let’s create a simple attribute for the “Custom Email Information Type.” Press the “+ Add Attribute” button to begin. We’ll set it to match data that follows a basic email pattern: “.*@.*

Note: This pattern is overly simplistic and will incorrectly match invalid entries like “@.” or “!!!@…”. A more robust email validation pattern should be used in production environments.

The new attribute dialog appears with two panels: “Attribute” on the left and “Testing” on the right. The Attribute panel is used to configure your new attribute settings, while the Testing panel allows you to verify these settings as you create them.

  1. In the “Attribute” panel, enter “Custom Email Attribute” in the Name field. For Attribute Template, we’ll keep the default “New” option, since we don’t have any other templates available yet.

Next, we’ll focus on two key areas: Attribute Filters and Default Masking Method.

  1. For Attribute Filters, leave both Object Name and Column Name fields unchanged. Click only the Column Data option. This means our attribute will ignore database Object Name and Column Name patterns, focusing solely on checking if the data string contains an email-like pattern (with an @ character).
  2. Set the Column Data Type to ‘Strings Only’ and Search Method to ‘Unstructured Text’.
  3. Enter “.*@.*” into the “Template for Column Contents (start each template from new line)” field. Before saving the attribute or using it in any rules or tasks, we can test whether it correctly matches email patterns.
  1. Now let’s use the Testing panel on the right. Leave all preceding fields unchanged and enter a sample email like “[email protected]” in the Column Value field. Click the Test button – the results should show that the Attribute successfully detects emails in the column data.

Step 3 – Attribute Masking Settings

  1. Click ‘2. Default Masking Method’ to continue with masking method setup. This switches Attribute panel to masking setup.
  1. Set the Masking Method dropdown to Email Masking. Default Masking Method is used in Data Compliance tool, in Data-inspired Dynamic Masking, and in Static Masking. DataSunrise applies this masking method when there are no key constraints that need to be maintained to preserve the masked database’s integrity.
  2. Set the Alternative Masking Method dropdown to FP Encryption FF3 Email. Alternative masking methods are necessary when maintaining table referential integrity. You cannot simply mask foreign keys with random strings, as this would break references between tables. The masking must ensure that references from other tables still point to the correct rows after masking is applied. Similarly, primary keys must remain unique and maintain their referential relationships with other tables even after masking.

In the Static Masking example below, the ’email’ and ‘contact_email’ columns are masked using different methods due to key constraints. For example, John Doe’s email address, which has a foreign key constraint, is masked as ‘[email protected]’. Meanwhile, his contact_email, which has no constraints, is masked using the default method, appearing as ‘j***.***@.**m’.

Important: While this setup is compatible with traditional databases, some storage systems like Amazon S3 and unstructured text files don’t follow typical database organization with columns and objects. For these storage types, be careful when using object name and column name filters, as they may prevent proper data matching.

  1. Click ‘Add’ button to link the Attribute to Information Type.

Testing Information Types with Alternative Masking Methods

The following example demonstrates how DataSunrise implements both Default and Alternative masking methods. Let’s examine this through the creation of sample tables:

-- Creating the emails table (Parent table)
CREATE TABLE emails (
    email VARCHAR(50) PRIMARY KEY,
    email_type VARCHAR(20),
    created_date DATE
);
-- Creating the employees table with additional contact_email
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    first_name VARCHAR(30),
    last_name VARCHAR(30),
    email VARCHAR(50),
    contact_email VARCHAR(50),
    hire_date DATE,
    FOREIGN KEY (email) REFERENCES emails(email)
);
-- Inserting data into emails table
INSERT INTO emails VALUES
('[email protected]', 'corporate', '2023-01-15'),
('[email protected]', 'corporate', '2023-02-20'),
…
('[email protected]', 'corporate', '2023-05-01');
-- Inserting data into employees table with contact emails
INSERT INTO employees VALUES
(1, 'John', 'Doe', '[email protected]', '[email protected]', '2023-01-16'),
(2, 'Sarah', 'Smith', '[email protected]', '[email protected]', '2023-02-21'),
…
(5, 'Peter', 'Jones', '[email protected]', '[email protected]', '2023-05-02');

Static Masking with Default and Alternative Masking

Static masking allows users to automatically mask sensitive data identified during the discovery process. This automated approach, known as Automatic Mode, determines which source tables to transfer and which columns to mask. The system implements two distinct masking strategies: one for standard data fields and another for relational data elements (such as foreign and primary keys). This dual approach ensures data consistency while maintaining referential integrity across database relationships.

During the execution of Static Masking Task on the ’employees’ table, different masking methods were applied to email fields. The constrained ’email’ column, which serves as a foreign key, maintained its format through Format-preserved masking. Meanwhile, the unconstrained ‘contact_email’ column underwent simple character masking, where only the middle portion of the email addresses was obscured.

Conclusion

This article provided an in-depth exploration of DataSunrise Information Types. We guided you through the main steps of creating Information Types and defining their attributes. Information Types are primarily used in Data Discovery to identify sensitive data based on specific properties defined by their attributes. Additionally, these Information Types are actively used during data access through a feature called Data-Inspired Security, which enables data masking, blocking, and event log tagging.

We also examined use cases for Default masking and discussed how alternative masking methods can be applied when static masking tasks encounter key constraints in the masked database.

Next

Compliance-Based Database Auditing

Compliance-Based Database Auditing

Learn More

Need Our Support Team Help?

Our experts will be glad to answer your questions.

General information:
[email protected]
Customer Service and Technical Support:
support.datasunrise.com
Partnership and Alliance Inquiries:
[email protected]