InsightPipelines — Governed & Secure Data Warehouse using Snowflake

5 min readDec 8, 2024

Introduction

In the insurance industry, data-driven decision-making is no longer optional; it’s essential. The surge in policyholder information, claims records, broker communications, and regulatory documentation demands a robust, secure, and governed data warehouse. Insurance firms need to ingest, cleanse, transform, and deliver structured and unstructured data to multiple business units — including Policy, Claims, Brokers, Reinsurance, Finance, Marketing, and Legal — to drive Business Intelligence (BI), Data Science, and AI/ML initiatives.

To achieve this, a Medallion Data Architecture is a proven framework that segments data into Bronze (raw), Silver (cleaned), and Gold (curated) layers. Leveraging Snowflake on AWS as the data warehouse, companies can centralize data for downstream consumption. But how do insurers create a pipeline to move data securely and efficiently from various sources into this Medallion architecture?

This is where InsightPipelines plays a transformative role. InsightPipelines is a low-code/no-code ETL platform that simplifies data ingestion, transformation, and governance. By combining InsightPipelines and Snowflake, insurers can establish a secure, unified, and governed data warehouse to empower business units and unlock data’s full potential.

Challenges Faced by Insurance Companies

Data Silos Across Business Units: Policies, claims, brokers, finance, marketing, and legal teams operate in isolation, leading to duplicate data and inefficiencies.
Unstructured Data Complexity: ACORD forms, policy binders, sticky notes, and claim attachments are unstructured and difficult to analyze.
Regulatory Compliance & Security: GDPR, HIPAA, and other regulations require strict data access control, encryption, and secure sharing of sensitive data.
Data Quality Issues: Inconsistent policyholder names, addresses, and claim details impact business decision-making and model accuracy.
Scaling Analytics & AI/ML Models: Traditional data pipelines are slow to process and prepare large datasets for predictive modeling and Generative AI.

What is InsightPipelines?

InsightPipelines is a comprehensive low-code/no-code ETL solution that enables insurance companies to collect, transform, and deliver data to a governed and secure data warehouse on Snowflake. It simplifies the creation of a Medallion Data Architecture (Bronze, Silver, Gold) to unify structured and unstructured data from multiple sources.

Key Features of InsightPipelines

Multi-Source Data Ingestion: Ingests data from Policy, Claims, Brokers, Reinsurance, Finance, Marketing, Legal, and third-party sources.
Data Transformation: Uses dBT, Spark, and Python to clean, normalize, and standardize data.
Governance & Security: Provides data masking, tokenization, and role-based access control.
Data Lineage & Traceability: Tracks changes to support audit trails and compliance.
Data Product & API Creation: Supports secure API and SFTP data sharing for brokers, reinsurers, and auditors.
Supports Unstructured Data: Handles images, PDFs, ACORD forms, sticky notes, and other document types.

Building a Medallion Data Architecture Using InsightPipelines & Snowflake

The Medallion Architecture is a proven approach for structuring data into three stages — Bronze, Silver, and Gold — to increase data quality, security, and consumption efficiency. Here’s a detailed breakdown of each layer and how InsightPipelines facilitates data flow from source to Snowflake.

1. Bronze Layer (Raw Data)

Location: AWS S3
Data Type: Raw data (structured, semi-structured, and unstructured)
Purpose: Ingests immutable, raw data from multiple sources (Policy, Claims, Finance, Marketing, Brokers, Reinsurance, Legal).
Usage: Used as the primary archive for audit trails, incident investigations, and unaltered source data.

Example Use Case: Ingesting ACORD forms, claim documents, and policy binder letters uploaded by brokers and underwriters directly into the Bronze layer (AWS S3) for downstream processing.

Tools Used: InsightPipelines (ETL), AWS S3 (Data Lake)

2. Silver Layer (Clean Data)

Location: Snowflake (Data Warehouse)
Data Type: Clean, deduplicated, and normalized data.
Purpose: Applies transformations like deduplication, type casting, standardization, and basic aggregations.
Usage: Data becomes suitable for business intelligence dashboards, operational reporting, and predictive underwriting models.

Example Use Case: Extracting claim sticky notes into text, normalizing the claim type and claimant information, and creating a unified view of claim history.

Tools Used: InsightPipelines (ETL), dBT (Data Transformation), Snowflake (Data Warehouse)

3. Gold Layer (Curated Data)

Location: Snowflake (Data Warehouse)
Data Type: Business-ready datasets, optimized for consumption.
Purpose: Supplies curated datasets for predictive underwriting, claims analysis, operational dashboards, and audit reporting.
Usage: Used by AI/ML models, BI tools (Tableau, PowerBI), and self-service analytics platforms.

Example Use Case: A Predictive Claims Model requires a “Claims Risk Score” that considers historical claims, policyholder data, and broker quality. InsightPipelines curates this risk score in the Gold layer and exposes it via APIs for underwriting workstations.

Tools Used: InsightPipelines, Snowflake, Python (for advanced transformations)

Data Transformation with InsightPipelines

InsightPipelines enables data cleansing, normalization, and enrichment at every stage of the Medallion architecture.

Data Transformation Technologies

dBT: Data modeling, SQL transformations, and business logic.
Spark: Scalable processing for large datasets.
Python: Custom data transformations for unstructured documents (OCR, NLP on claim notes).

Data Governance & Security

For insurance companies, compliance with GDPR, HIPAA, and CCPA requires robust data governance. Here’s how InsightPipelines addresses it.

Data Masking: Ensures sensitive data like social security numbers (SSNs) are masked.
Tokenization: Encrypts PII (personally identifiable information) like names, emails, and addresses.
Lineage Tracking: Monitors data changes from ingestion to the Gold layer.
Role-Based Access Control (RBAC): Restricts access to data by user role (e.g., Claims, Underwriting, Legal).
Audit Logs: Tracks every action performed on the data for regulatory compliance.

How InsightPipelines Powers Analytics & AI

Predictive Claims Analysis: Use curated claims data to predict settlement times and fraud probability.
AI-Driven Underwriting: Uses policyholder behavior, claims data, and broker history to create underwriting risk models.
Chatbots & Generative AI: Extracts insights from claims notes to drive AI-based customer support assistants.
Real-Time Alerts: Sends risk alerts to underwriters based on broker quality, policyholder risk, or claim risk scores.

Data Product & API Creation

InsightPipelines allows the creation of data products and APIs to enable self-service data access for business users and external partners.

SFTP Exports: Share data extracts with reinsurers, brokers, and legal teams.
REST APIs: Expose policy, claims, and operational data as APIs for underwriting workstations.
BI Dashboards: Delivers live data feeds to Tableau, Power BI, and custom dashboards.

Benefits of Using InsightPipelines & Snowflake

Centralized Data Warehouse: All business units (Policy, Claims, Brokers) work with the same governed data.
Reduced Cost: No need for multiple ETL tools, and Snowflake’s consumption-based pricing optimizes costs.
Agility & Scalability: Add new data sources (Policy, Brokers, Marketing) quickly.
Audit-Ready: Lineage and audit logs ensure data quality and traceability.
Supports AI/ML Models: Train underwriting and claims risk models using Gold-layer datasets.

Conclusion

InsightPipelines + Snowflake on AWS is a winning combination for insurance companies looking to establish a governed and secure data warehouse. By following the Medallion Architecture (Bronze, Silver, Gold), insurers can provide clean, accessible, and secure data to business units and AI/ML models. This framework also ensures compliance, data security, and audit readiness.

Want to future-proof your insurance data warehouse? It’s time to leverage the power of InsightPipelines & Snowflake. Schedule a demo to see how your underwriting, claims, and reinsurance data can drive AI, automation, and smarter decision-making.