Sensitive Data Discovery for Parquet File Format
DataSunrise Sensitive Data Discovery is available for Parquet file format for fast data search, classification and management. Parquet is an open source file format that stores nested data structures in columnar format. This approach has several advantages, for example, if queries need to read specific columns from a large table. Parquet optimizes data storage, saves and compresses data faster and more efficiently, saving disk space.
Due to these advantages and flexibility, Parquet is often used for permanent and temporary storage of data, for import and export from various sources, and data transfer between different applications and services. Every year the amount of such data is increasing rapidly. With their increase, software for data analysis and protection, such as Apache Hive data warehouse and Amazon Athena interactive query service, are becoming more widespread. It allows you to analyze large datasets residing in distributed storage using SQL.
DataSunrise version 7.3 supports the Parquet file format, along with CSV, XML, JSON, and unstructured text files when performing Sensitive Data Discovery across AWS S3 buckets.
Searching for data is effected through a set of predefined filters that can be customized. By default, the filters are set to find the following types of data:
- Financial (codes, credit card numbers, PIN codes, etc.);
- Geographic (names of cities, countries, ZIP codes, etc.);
- Medical (search for medical records);
- Numbers (account numbers, certificates, license plates, etc.);
- Social Security Number;
Searching and analyzing data in your data storages ensures you to pinpoint sensitive data in Amazon S3 in time, quickly and effortlessly. With DataSunrise you can be sure that your data is completely protected from data leakage.