Category Archives: Data Warehouse

Data Loss Prevention (DLP) for Structured Data Sources

When people think of Data Loss Prevention, we usually think of Endpoint protection, such as Symantec Endpoint Security solution, preventing the upload of data to web sites, or downloaded to a USB device. The data being “illegally” transferred typically conforms to a particular pattern such as Personal Identifiable Information (PII), i.e. Social Security numbers.

Using a client for local monitoring of the endpoint, the agent detects the transfer of information as a last line of defense for external distribution. EndPoint solutions could monitor suspicious activity and/or proactively cancel the data transfer in progress.

Moving closer to the source of the data loss, monitoring databases filled with Personal Identifying Information (PII) has its advantages and disadvantages. One may argue there is no data loss until the employee attempts to export the data outside the corporate network, and the data is in-flight. In addition, extracted PII data may be “properly utilized” within the corporate network for analysis.

There is a database solution that provides similar “endpoint” monitoring and protection, e.g. identifying PII data extraction, with real-time query cancellation upon detection, leveraging “out of the box” data patterns, Teleran Technologies. Teleran supports relational databases such as Oracle, and Microsoft SQL Server, both on-prem, and cloud solutions.

Updates in Data Management Policies

Identifying the data loss points of origination provides opportunities to update the gaps in data management policy and the implementation of additional controls over data. Data classification is done dynamically based on common data mask structures. Users may build additional rules to cover custom structures. So, for example, a business analyst executes a query against a database that appears to fit predefined data masks, such as SSN, the query may be canceled before it’s even executed, and/or this “suspicious” activity can be flagged for the Chief Information Officer and/or Chief Security Officer (CSO)

Bar none, I’ve seen only one firm that defends a company’s data assets closer to the probable leak of information, the database, Teleran Technologies, See what they have to offer your organization for data protection and compliance.

Prevalent Remote Work Changes Endpoint Strategy

Endpoints in our corporate environments of prevalent remote working may highlight the need that relying on endpoints may be too late to enforce data protection. We may need to bring potential data loss detection into the inner sanctum of the corporate networks and need prevention closer to the source of data being extracted. How are “semi-trusted” third parties such as staff augmentation from offshore dealt?

Endpoint DLP – Available Breach Tactics

Endpoint DLP may capture and contain attempts to extract PII data, for example, parsing text files for SSNs, or other data masks. However, there are ways around the transfer detection, making it lofty to identify, such as screen captures of data, converting from text into images. Some Endpoint providers boast about their Optical Character Recognition (OCR), however, turning on this feature may produce many false positives, too many to sift through in monitoring, and unmanageable to control. The best DLP defense is to monitor and control closer to the data source, and perhaps, flag data requests from employees, e.g. after SELECT statement entered, UI Pops up a “Reason for Request?” if PII extraction is identified in real-time, with auditable events that can flow into Splunk.

People Turn Toward “Data Banks” to Commoditize on their Purchase and User Behavior Profiles

Anyone who is anti “Big Brother”, this may not be the article for you, in fact, skip it. 🙂

 

The Pendulum Swings Away from GDPR

In the not so distant future, “Data Bank” companies consisting of Subject Matter Experts (SME) across all verticals,  may process your data feeds collected from your purchase and user behavior profiles.  Consumers will be encouraged to submit their data profiles into a Data Bank who will offer incentives such as a reduction of insurance premiums to cash back rewards.

 

Everything from activity trackers, home automation, to vehicular automation data may be captured and aggregated.    The data collected can then be sliced and diced to provide macro and micro views of the information.    On the abstract, macro level the information may allow for demographic, statistical correlations, which may contribute to corporate strategy. On a granular view, the data will provide “data banks” the opportunity to sift through data to perform analysis and correlations that lead to actionable information.

 

Is it secure?  Do you care if a hacker steals your weight loss information? May not be an issue if collected Purchase and Use Behavior Profiles aggregate into a Blockchain general ledger.  Data Curators and Aggregators work with SMEs to correlate the data into:

  • Canned, ‘intelligent’ reports targeted for a specific subject matter, or across silos of data types
  • ‘Universes’ (i.e.  Business Objects) of data that may be ‘mined’ by consumer approved, ‘trusted’ third party companies, e.g. your insurance companies.
  • Actionable information based on AI subject matter rules engines and consumer rule transparency may be provided.

 

 “Data Banks” may be required to report to their customers who agreed to sell their data examples of specific rows of the data, which was sold on a “Data Market”.

Consumers may have the option of sharing their personal data with specific companies by proxy, through a ‘data bank’ granular to the data point collected.  Sharing of Purchase and User Behavior Profiles:

  1. may lower [or raise] your insurance premiums
  2. provide discounts on preventive health care products and services, e.g. vitamins to yoga classes
  3. Targeted, affordable,  medicine that may redirect the choice of the doctor to an alternate.  The MD would be contacted to validate the alternate.

 

The curriated data collected may be harnessed by thousands of affinity groups to offer very discrete products and services.  Purchase and User Behavior Profiles,  correlated information stretches beyond any consumer relationship experienced today.

 

At some point, health insurance companies may require you to wear a tracker to increase or slash premiums.  Auto Insurance companies may offer discounts for access to car smart data to make sure suggested maintenance guidelines for service are met.

 

You may approve your “data bank” to give access to specific soliciting government agencies or private firms looking to analyze data for their studies. You may qualify based on the demographic, abstracted data points collected for incentives provided may be tax credits, or paying studies.

Purchase and User Behavior Profiles:  Adoption and Affordability

If ‘Data Banks’ are allowed to collect Internet of Things (IoT) device profile and the devices themselves are cost prohibitive.  here are a few ways to increase their adoption:

  1.  [US] tax coupons to enable the buyer, at the time of purchase, to save money.  For example, a 100 USD discount applied at the time of purchase of an Activity Tracker, with the stipulation that you may agree,  at some point, to participate in a study.
  2. Government subsidies: the cost of aggregating and archiving Purchase and Behavioral profiles through annual tax deductions.  Today, tax incentives may allow you to purchase an IoT device if the cost is an itemized medical tax deduction, such as an Activity Tracker that monitors your heart rate, if your medical condition requires it.
  3. Auto, Life, Homeowners, and Health policyholders may qualify for additional insurance deductions
  4. Affinity branded IoT devices, such as American Lung Association may sell a logo branded Activity Tracker.  People may sponsor the owner of the tracking pedometer to raise funds for the cause.

The World Bank has a repository of data, World DataBank, which seems to store a large depth of information:

World Bank Open Data: free and open access to data about development in countries around the globe.”

Here is the article that inspired me to write this article:

http://www.marketwatch.com/story/you-might-be-wearing-a-health-tracker-at-work-one-day-2015-03-11

 

Privacy and Data Protection Creates Data Markets

Initiatives such as General Data Protection Regulation (GDPR) and other privacy initiatives which seek to constrict access to your data to you as the “owner”, as a byproduct, create opportunities for you to sell your data.  

 

Blockchain: Purchase, and User Behavior Profiles

As your “vault”, “Data Banks” will collect and maintain your two primary datasets:

  1. As a consumer of goods and services, a Purchase Profile is established and evolves over time.  Online purchases are automatically collected, curated, appended with metadata, and stored in a data vault [Blockchain].  “Offline” purchases at some point, may become a hybrid [on/off] line purchase, with advances in traditional monetary exchanges, and would follow the online transaction model.
  2. User Behavior (UB)  profiles, both on and offline will be collected and stored for analytical purposes.  A user behavior “session” is a use case of activity where YOU are the prime actor.  Each session would create a single UB transaction and are also stored in a “Data Vault”.   UB use cases may not lead to any purchases.

Not all Purchase and User Behavior profiles are created equal.  Eg. One person’s profile may show a monthly spend higher than another.  The consumer who purchases more may be entitled to more benefits.

These datasets wholly owned by the consumer, are safely stored, propagated, and immutable with a solution such as with a Blockchain general ledger.

Blended Data Warehouse SW/HW Solutions Phased Into the Cloud

Relational Database Solutions “In a Box”

Several of the relational database software vendors, such as IBM, Oracle, and Teradata have developed proprietary data warehouse software to be tightly coupled with server hardware to maximize performance.  These solutions have been developed and refined as “on-prem” solutions for many years.

We’ve seen the rise of “Database (DW)  as a Service” from companies like Amazon, who sell Redshift services.

Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools.  It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds.

RDB Complex Software/Hardware Maintenance

In recent times, the traditional relational database software vendors shifted gears to become service providers offering maximum performance from a solution hosted by them, the vendor, in the Cloud.    On the positive side, the added complexity of configuring and tuning a blended software/hardware data warehouse has been shifted from the client’s team resources such as Database Administrators (DBAs), Network Administrators,  Unix/Windows Server Admins,… to the database software service provider.  The complexity of tuning for scalability, and other maintenance challenges shifts to the software vendor’s expertise, if that’s the abstraction you select.  There is some ambiguity in the delineation of responsibilities with the RDBMS vendor’s cloud offerings.

Total Cost of Ownership

Quantifying the total cost of ownership of a solution may be a bit tricky, especially if you’re trying to quantify the RDBMS hybrid software/hardware “on-prem” solution versus the same or similar capabilities brought to the client via “Database (DW) as a Service”.

“On-Prem”, RDB Client Hosted Solution

Several factors need to be considered when selecting ANY software and/or Hardware to be hosted at the client site.

  • Infrastructure “when in Rome”
    • Organizations have a quantifiable cost related to hosting physical or virtual servers in the client’s data center and may be boiled down to a number that may include things like HVAC, or new rack space.
    • Resources used to maintain/monitor DC usage, there may be an abstracted/blended figure.
  • Database Administrators maintain and monitor RDB solutions.
    • Activities may range from RDB patches/upgrades to resizing/scaling the DB storage “containers”.
    • Application Database Admins/Developers may be required to maintain the data warehouse architecture, such as new requirements, e.g. creating aggregate tables for BI analysis.
  • Network Administrators
    • Firewalls, VPN
    • Port Scanning
  • Windows/Unix Server Administrators
    • Antivirus
    • OS Patches

Trying to correlate these costs in some type of “Apples to Apples” comparison to the “Data Warehouse as a Service” may require accountants and technical folks to do extensive financial modeling to make the comparison.   Vendors, such as Oracle, offer fully managed services to the opposite end of the spectrum, the “Bare Metal”, essentially the “Infra as a Service.”  The Oracle Exadata solution can be a significant investment depending on the investment in redundancy and scalability leveraging Oracle Real Application Clusters (RAC). 

Support and Staffing Models for DW Cloud Vendors

In order for the traditional RDB software vendors to accommodate a “Data Warehouse as a Service” model, they may need to significantly increase staff for a variety of technical disciplines, as outlined above with the Client “On-Prem” model.  A significant ramp-up of staff and the organizational challenges of developing and implementing a support model based on a variety of factors may have relational database vendors ask: Should they leverage a top tier consulting agency such as Accenture, or Deloitte to define, implement, and refine a managed service?  It’s certainly a tall order to go from a software vendor to offering large scale services.  With corporate footprints globally and positive track records implementing managed services of all types, it’s an attractive proposition for both the RDB vendor and the consulting agency who wins the bid.  Looking at the DW Service billing models don’t seem sensical on some level.  Any consulting agency who implements a DW managed service would be responsible to ensure ROI both for the RDS vendor and their clients.  It may be opaque to the end client leveraging the Data Warehouse as a Service, but certainly, the quality of service provided should be nothing less than if implemented by the RDB vendor itself.  If the end game for the RDB vendor is for the consulting agency to implement, and mature the service then at some point bring the service in-house, it could help to keep costs down while maturing the managed service.

Oracle Exadata

Here are URLs for reference to understand the capabilities that are realized through Oracle’s managed services.

https://cloud.oracle.com/en_US/database

https://cloud.oracle.com/en_US/database/exadata/features

https://www.oracle.com/engineered-systems/exadata/index.html

Teradata

https://www.teradata.com/products-and-services/intellicloud

https://www.teradata.com/products-and-services/cloud-overview

Teradata
Teradata

DB2

https://www.ibm.com/cloud/db2-warehouse-on-cloud

IBM Mainframe
IBM Mainframe


Note: The opinions shared here are my own.