Smarter Data Discovery with AI Score & GPU Speed
Data discovery feature just got a major upgrade. DataSunrise now features an AI-powered scoring model that dramatically improves the accuracy of sensitive data detection—and optional GPU acceleration to make it faster than ever.
The Challenge: False Positives in Data Discovery
Traditional pattern-based discovery relies on regular expressions and predefined rules. While effective for clear-cut cases, these methods often flag data that merely looks like sensitive information. A product SKU might match a credit card pattern. A random alphanumeric string could trigger a passport number alert. The result? Security teams waste time reviewing false positives instead of focusing on real risks.
The Solution: AI Score with Confidence Metrics
Our new AI Score feature adds a machine learning layer to data discovery. Instead of relying solely on pattern matching, the system now:
- Analyzes context — Examines surrounding data to understand what a value actually represents
- Considers metadata — Uses column names and data structure as classification hints
- Assigns confidence scores — Provides a 0-100 score indicating how likely a match is genuine
The AI model uses seven classification labels like PERSON, CREDITCARDNUMBER, EMAIL, LOCATION, PASSPORT, ZIPCODE, and DATE-TIME to cross-validate pattern matches against ML predictions. When both agree, you can trust the result. When they disagree, you know to investigate further.
Vertical Snippets: Column-Level Intelligence
AI Score goes beyond individual value analysis with Vertical Snippets. This feature examines multiple values within a column to build a complete picture:
- A column named "customer_email" containing email-like strings? High confidence.
- A column named "product_code" with the same patterns? Likely false positive.
This column-level context dramatically reduces noise in discovery reports.
GPU Acceleration for AI Score
For organizations scanning large data volumes, AI Score supports NVIDIA CUDA acceleration. GPU-powered inference processes discovery tasks significantly faster than CPU-only deployments—without sacrificing accuracy.
No GPU? No problem. The feature works on CPU with the ONNX runtime as well, making AI-enhanced discovery still accessible to any deployment.
CUDA Acceleration Across ML Features
GPU acceleration isn't limited to AI Score. DataSunrise now supports CUDA across all ML-powered features:
- NLP Data Discovery — Natural language processing for detecting sensitive data in unstructured text now runs on GPU for faster scans of large document repositories
- ML-Based User Suspicious Behavior Detection — Real-time behavioral analysis benefits from GPU acceleration, enabling faster model training and validation against database activity patterns
All three features share the same ONNX runtime infrastructure, so a single CUDA setup accelerates your entire ML pipeline. Configure once, benefit everywhere.
Flexible Deployment
AI Score integrates directly into existing Data Discovery workflows:
- Enable through report type settings—no infrastructure changes required
- Works with CSV, XML, JSON, PDF, Parquet, and unstructured text files
- Customize scoring weights to match your organization's risk tolerance
- Train custom ONNX models for region/industry-specific data patterns
Key Benefits of Data Discovery with AI Score
| Before | After |
|---|---|
| High false positive rates | ML-validated matches with confidence scores |
| Manual review of every alert | Focus only on low-confidence items |
| Pattern matching only | Context-aware classification |
| CPU-bound processing | Optional GPU acceleration |
Get Started
AI Score is available now in DataSunrise. Enable it in your Data Discovery task settings and start seeing cleaner, more actionable results immediately.
For detailed configuration options, see the newly added/updated appropriate sections in our User Guide.
Additional information
Protect Your Data with DataSunrise
Secure your data across every layer with DataSunrise. Detect threats in real time with Activity Monitoring, Data Masking, and Database Firewall. Enforce Data Compliance, discover sensitive data, and protect workloads across 50+ supported cloud, on-prem, and AI system data source integrations.
Start protecting your critical data today
Request a Demo Download Now