Tag Archives: DB2

Data Loss Prevention (DLP) for Structured Data Sources

When people think of Data Loss Prevention, we usually think of Endpoint protection, such as Symantec Endpoint Security solution, preventing the upload of data to web sites, or downloaded to a USB device. The data being “illegally” transferred typically conforms to a particular pattern such as Personal Identifiable Information (PII), i.e. Social Security numbers.

Using a client for local monitoring of the endpoint, the agent detects the transfer of information as a last line of defense for external distribution. EndPoint solutions could monitor suspicious activity and/or proactively cancel the data transfer in progress.

Moving closer to the source of the data loss, monitoring databases filled with Personal Identifying Information (PII) has its advantages and disadvantages. One may argue there is no data loss until the employee attempts to export the data outside the corporate network, and the data is in-flight. In addition, extracted PII data may be “properly utilized” within the corporate network for analysis.

There is a database solution that provides similar “endpoint” monitoring and protection, e.g. identifying PII data extraction, with real-time query cancellation upon detection, leveraging “out of the box” data patterns, Teleran Technologies. Teleran supports relational databases such as Oracle, and Microsoft SQL Server, both on-prem, and cloud solutions.

Updates in Data Management Policies

Identifying the data loss points of origination provides opportunities to update the gaps in data management policy and the implementation of additional controls over data. Data classification is done dynamically based on common data mask structures. Users may build additional rules to cover custom structures. So, for example, a business analyst executes a query against a database that appears to fit predefined data masks, such as SSN, the query may be canceled before it’s even executed, and/or this “suspicious” activity can be flagged for the Chief Information Officer and/or Chief Security Officer (CSO)

Bar none, I’ve seen only one firm that defends a company’s data assets closer to the probable leak of information, the database, Teleran Technologies, See what they have to offer your organization for data protection and compliance.

Prevalent Remote Work Changes Endpoint Strategy

Endpoints in our corporate environments of prevalent remote working may highlight the need that relying on endpoints may be too late to enforce data protection. We may need to bring potential data loss detection into the inner sanctum of the corporate networks and need prevention closer to the source of data being extracted. How are “semi-trusted” third parties such as staff augmentation from offshore dealt?

Endpoint DLP – Available Breach Tactics

Endpoint DLP may capture and contain attempts to extract PII data, for example, parsing text files for SSNs, or other data masks. However, there are ways around the transfer detection, making it lofty to identify, such as screen captures of data, converting from text into images. Some Endpoint providers boast about their Optical Character Recognition (OCR), however, turning on this feature may produce many false positives, too many to sift through in monitoring, and unmanageable to control. The best DLP defense is to monitor and control closer to the data source, and perhaps, flag data requests from employees, e.g. after SELECT statement entered, UI Pops up a “Reason for Request?” if PII extraction is identified in real-time, with auditable events that can flow into Splunk.

Blended Data Warehouse SW/HW Solutions Phased Into the Cloud

Relational Database Solutions “In a Box”

Several of the relational database software vendors, such as IBM, Oracle, and Teradata have developed proprietary data warehouse software to be tightly coupled with server hardware to maximize performance.  These solutions have been developed and refined as “on-prem” solutions for many years.

We’ve seen the rise of “Database (DW)  as a Service” from companies like Amazon, who sell Redshift services.

Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools.  It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds.

RDB Complex Software/Hardware Maintenance

In recent times, the traditional relational database software vendors shifted gears to become service providers offering maximum performance from a solution hosted by them, the vendor, in the Cloud.    On the positive side, the added complexity of configuring and tuning a blended software/hardware data warehouse has been shifted from the client’s team resources such as Database Administrators (DBAs), Network Administrators,  Unix/Windows Server Admins,… to the database software service provider.  The complexity of tuning for scalability, and other maintenance challenges shifts to the software vendor’s expertise, if that’s the abstraction you select.  There is some ambiguity in the delineation of responsibilities with the RDBMS vendor’s cloud offerings.

Total Cost of Ownership

Quantifying the total cost of ownership of a solution may be a bit tricky, especially if you’re trying to quantify the RDBMS hybrid software/hardware “on-prem” solution versus the same or similar capabilities brought to the client via “Database (DW) as a Service”.

“On-Prem”, RDB Client Hosted Solution

Several factors need to be considered when selecting ANY software and/or Hardware to be hosted at the client site.

  • Infrastructure “when in Rome”
    • Organizations have a quantifiable cost related to hosting physical or virtual servers in the client’s data center and may be boiled down to a number that may include things like HVAC, or new rack space.
    • Resources used to maintain/monitor DC usage, there may be an abstracted/blended figure.
  • Database Administrators maintain and monitor RDB solutions.
    • Activities may range from RDB patches/upgrades to resizing/scaling the DB storage “containers”.
    • Application Database Admins/Developers may be required to maintain the data warehouse architecture, such as new requirements, e.g. creating aggregate tables for BI analysis.
  • Network Administrators
    • Firewalls, VPN
    • Port Scanning
  • Windows/Unix Server Administrators
    • Antivirus
    • OS Patches

Trying to correlate these costs in some type of “Apples to Apples” comparison to the “Data Warehouse as a Service” may require accountants and technical folks to do extensive financial modeling to make the comparison.   Vendors, such as Oracle, offer fully managed services to the opposite end of the spectrum, the “Bare Metal”, essentially the “Infra as a Service.”  The Oracle Exadata solution can be a significant investment depending on the investment in redundancy and scalability leveraging Oracle Real Application Clusters (RAC). 

Support and Staffing Models for DW Cloud Vendors

In order for the traditional RDB software vendors to accommodate a “Data Warehouse as a Service” model, they may need to significantly increase staff for a variety of technical disciplines, as outlined above with the Client “On-Prem” model.  A significant ramp-up of staff and the organizational challenges of developing and implementing a support model based on a variety of factors may have relational database vendors ask: Should they leverage a top tier consulting agency such as Accenture, or Deloitte to define, implement, and refine a managed service?  It’s certainly a tall order to go from a software vendor to offering large scale services.  With corporate footprints globally and positive track records implementing managed services of all types, it’s an attractive proposition for both the RDB vendor and the consulting agency who wins the bid.  Looking at the DW Service billing models don’t seem sensical on some level.  Any consulting agency who implements a DW managed service would be responsible to ensure ROI both for the RDS vendor and their clients.  It may be opaque to the end client leveraging the Data Warehouse as a Service, but certainly, the quality of service provided should be nothing less than if implemented by the RDB vendor itself.  If the end game for the RDB vendor is for the consulting agency to implement, and mature the service then at some point bring the service in-house, it could help to keep costs down while maturing the managed service.

Oracle Exadata

Here are URLs for reference to understand the capabilities that are realized through Oracle’s managed services.

https://cloud.oracle.com/en_US/database

https://cloud.oracle.com/en_US/database/exadata/features

https://www.oracle.com/engineered-systems/exadata/index.html

Teradata

https://www.teradata.com/products-and-services/intellicloud

https://www.teradata.com/products-and-services/cloud-overview

Teradata
Teradata

DB2

https://www.ibm.com/cloud/db2-warehouse-on-cloud

IBM Mainframe
IBM Mainframe


Note: The opinions shared here are my own.

Applying Artificial Intelligence & Machine Learning to Data Warehousing

Protecting the Data Warehouse with Artificial Intelligence

Teleran is a middleware company who’s software monitors and governs OLAP activity between the Data Warehouse and Business Intelligence tools, like Business Objects and Cognos.   Teleran’s suite of tools encompass a comprehensive analytical and monitoring solution called iSight.  In addition, Teleran has a product that leverages artificial intelligence and machine learning to impose real-time query and data access controls.  Architecture  also allows for Teleran’s agent not to be on the same host as the database, for additional security and prevention of utilizing resources from the database host.

Key Features of iGuard:
  • Policy engine prevents “bad” queries before reaching database
  • Patented rule engine resides in-memory to evaluate queries at database protocol layer on TCP/IP network
  • Patented rule engine prevents inappropriate or long-running queries from reaching the data
70 Customizable Policy Templates
SQL Query Policies
  • Create policies using policy templates based on SQL Syntax:
    • Require JOIN to Security Table
    • Column Combination Restriction –  Ex. Prevents combining customer name and social security #
    • Table JOIN restriction –  Ex. Prevents joining two different tables in same query
    • Equi-literal Compare requirement – Tightly Constrains Query Ex. Prevents hunting for sensitive data by requiring ‘=‘ condition
    • DDL/DCL restrictions (Create, Alter, Drop, Grant)
    • DQL/DML restrictions (Select, Insert, Update, Delete)
Data Access Policies

Blocks access to sensitive database objects

  • By user or user groups and time of day (shift) (e.g. ETL)
    • Schemas
    • Tables/Views
    • Columns
    • Rows
    • Stored Procs/Functions
    • Packages (Oracle)
Connection Policies

Blocks connections to the database

  • White list or black list by
    • DB User Logins
    • OS User Logins
    • Applications (BI, Query Apps)
    • IP addresses
Rule Templates Contain Customizable Messages

Each of the “Policy Templates”  has the ability to send the user querying the database a customized message based on the defined policy. The message back to the user from Teleran should be seamless to the application user’s experience.

iGuard Rules Messaging
iGuard Rules Messaging

 

Machine Learning: Curbing Inappropriate, or Long Running Queries

iGuard has the ability to analyze all of the historical SQL passed through to the Data Warehouse, and suggest new, customized policies to cancel queries with certain SQL characteristics.   The Teleran administrator sets parameters such as rows or bytes returned, and then runs the induction process.  New rules will be suggested which exceed these defined parameters.  The induction engine is “smart” enough to look at the repository of queries holistically and not make determinations based on a single query.

Finally, here is a high level overview of the implementation architecture of iGuard.  For sales or pre-sales technical questions, please contact www.teleran.com

Teleran Logical Architecture
Teleran Logical Architecture

 

Currently Featured Clients

Teleran Featured Clients
Teleran Featured Clients

 

People Turn Toward “Data Banks” to Commoditize Purchase and User Behavior Profiles

Anyone who is anti “Big Brother”, this may not be the article for you, in fact, skip it. 🙂

The Pendulum Swings Away from GDPR

In the not so distant future, “Data Bank” companies consisting of Subject Matter Experts (SME) across all verticals,  may process your data feeds collected from your purchase , and user behavior profiles.  Consumers will be encouraged to submit their data profiles into a Data Bank who will offer incentives such as a reduction of insurance premiums to cash back rewards.

 

Everything from activity trackers, home automation, to vehicular automation data may be captured and aggregated.    The data collected can then be sliced and diced to provide macro and micro views of the information.    On the abstract, macro level the information may allow for demographic, statistical correlations, which may contribute to corporate strategy.

On a granular view, the data will provide “data banks” the opportunity to sift through data to perform analysis and correlations that lead to actionable information.

 

Is it secure?  Do you care if a hacker steals your weight loss information? May not be an issue if collected Purchase and Use Behavior Profiles aggregate into a Blockchain general ledger.  Data Curators and Aggregators work with SMEs to correlate the data into:

  • Canned, ‘intelligent’ reports targeted to specific subject matter, or across silos of data types
  • ‘Universes’ (i.e.  Business Objects) of data that may be ‘mined’ by consumer approved, ‘trusted’ third party companies, e.g. your insurance companies.
  • Actionable information based on AI subject matter rules engines

 

Consumers may have the option of sharing their personal data with specific companies by proxy, through a ‘data bank’ granular to the data point collected.  Sharing of Purchase and User Behavior Profiles:

  1. may lower [or raise] your insurance premiums
  2. provide discounts on preventive health care products and services, e.g. vitamins to yoga classes
  3. Targeted, affordable,  medicine that may redirect the choice of the doctor to an alternate.  The MD would be contacted to validate the alternate.

The curriated data collected may be harnessed by thousands of affinity groups to offer very discrete products and services.  Purchase and User Behavior Profiles,  correlated information stretches beyond any consumer relationship experienced today.

 

At some point, health insurance companies may require you to wear a tracker to increase or slash premiums.  Auto Insurance companies may offer discounts for access to car smart data to make sure suggested maintenance guidelines for service are met.

You may approve your “data bank” to give access to specific soliciting government agencies or private research firms looking to analyze data for their studies. You may qualify based on the demographic, abstracted data points collected for incentives provided may be tax credits, or paying studies.

 

Purchase and User Behavior Profiles:  Adoption and Affordability

If ‘Data Banks’ are able to collect Internet of Things (IoT) enabled, are cost inhibiting.  here are a few ways to increase their adoption:

  1.  [US] tax coupons to enable the buyer, at the time of purchase, to save money.  For example, a 100 USD discount applied at the time of purchase of an Activity Tracker, with the stipulation that you may agree,  at some point, to participate in a study.
  2. Government subsidies: the cost of aggregating and archiving Purchase and Behavioral profiles through annual tax deductions.  Today, tax incentives may allow you to purchase an IoT device if the cost is an itemized medical tax deduction, such as an Activity Tracker that monitors your heart rate, if your medical condition requires it.
  3. Auto, Life, Homeowners, and Health policyholders may qualify for additional insurance deductions
  4. Affinity branded IoT devices, such as American Lung Association may sell a logo branded Activity Tracker.  People may sponsor the owner of the tracking pedometer to raise funds for the cause.

The World Bank has a repository of data, World DataBank, which seems to store a large depth of information:

World Bank Open Data: free and open access to data about development in countries around the globe.”

Here is the article that inspired me to write this article:

http://www.marketwatch.com/story/you-might-be-wearing-a-health-tracker-at-work-one-day-2015-03-11

Privacy and Data Protection Creates Data Markets

Initiatives such as General Data Protection Regulation (GDPR) and other privacy initiatives which seek to constrict access to your data to you as the “owner”, as a byproduct, create opportunities for you to sell your data.  

Blockchain: Purchase, and User Behavior Profiles

As your “vault”, “Data Banks” will collect and maintain your two primary datasets:

  1. As a consumer of goods and services, a Purchase Profile is established and evolves over time.  Online purchases are automatically collected, curated, appended with metadata, and stored in a data vault [Blockchain].  “Offline” purchases at some point, may become a hybrid [on/off] line purchase, with advances in traditional monetary exchanges, and would follow the online transaction model.
  2. User Behavior (UB)  profiles, both on and offline will be collected and stored for analytical purposes.  A user behavior “session” is a use case of activity where YOU are the prime actor.  Each session would create a single UB transaction and are also stored in  a “Data Vault”.   UB use cases may not lead to any purchases.

These datasets wholly owned by the consumer, are safely stored, propagated, and immutable with a solution such as with a Blockchain general ledger.

Business Intelligence, Analogies, and Articulation of Data on Mediums

As I was reading the article from the New York Times, As Boom Lures App Creators, Tough Part Is Making a Living, the typical doom and gloom story about the get rich quick with creation of applications on Tablets is true of any start-up company, may it be a restaurant, clothing shop, or other.  You have idea, Sally has an idea, and so does Fred, and the likely hood everyone will be elated about every bar, restaurant, clothing store or application is ridiculous.   Simple economics, and opportunity cost, you cannot go to every restaurant in parallel every night.  One USD trades off an opportunity to spend it somewhere else.  One area I would suspect has massive opportunities in the coming weeks, months, and years is Business Intelligence, Analogies, and Articulation of Data on a Tablet medium.  Yes, it is true, there are established players in the marketplace, but being established also makes you less nimble for change.  Being able to look at a clients Data Warehouse, and create mediums for analogies expressing where there customers have been spending their money, why, and help predict trends in a KISS fashion to any level of a business organization is key.  That is why the innate talents of user interface, user interface engineering, or way back it was called industrial design.  In short, part of the appetite for corporate spending will always come from how do I make more money with the product I just bought, Return on Investment (ROI).  Business Intelligence is one area I have been studying for years, and as all people know, we all find it difficult to express, or analogize thoughts, and specifically, dive into ‘data’ and turn it into information a CEO, or business analyst can understand and turn that ‘information’ into a new marketing campaign, hence, business intelligence.  Until we can all read minds, and transfer like for like information, BI, and improving upon this space will be an area to derive income.

[dfads params=’groups=1177,1178&limit=1&orderby=random’]

Tablet Developers Make Business Intelligence Tools using Google as a Data Warehouse: Completing with Oracle, IBM, and Microsoft SQL Server

And, he shoots, and scores.  I called it, sort of.  Google came out of the closet today as a data warehouse vendor, at least they need a community of developers to connect the dots to help build an amazing Business Intelligence suite.

Google came out with a Google Docs API today, which using languages from Objective-C (iOS), C#, to Java so you can use Google as your Data Warehouse for any size business. All you need to do is write an ETL program which uploads and downloads tables from your local database to Google Docs, and you create your own Business Intelligence User Interface for the creation and viewing of Charts & Graphs.  It looks like they’ve changed strategies, or this was the plan all along.

Initially I thought that Google Fusion was going to be the table editing tool to manipulate your data that was transferred from your transactional database using the Google Docs API.  Today they released a Google Docs API and developers can create their own ETL drivers and a Business Intelligence User Interface that can run on any platform from an Android Tablet, iPad, or Windows Tablet.

A few days ago, I wrote the article, which looked like they were going to use a tool called Google Fusion, which was in Beta at the time to manipulate tabular data, and eventually extend it to create common BI components, such as graphs, charts, edit tables, etc.

A few gotchas: Google Docs on Apple iPad is version 1.1.1 released 9/28/12, so we are talking very early days, and the Google Docs API was released today.   I would imagine since you can also use C#, someone can make a Windows application on the desktop to manipulate the data tables, create and view graphs, so a Windows Tablet can be used.  The API also has Java compatibility, so from any Unix box, or any platform, Java is write once, run anywhere, wherever your transitional database lives, a developer is able to write a driver to transfer the data to Google Docs dynamically, and then use Google Docs API for Business Intelligence.  You can even write an ETL driver which all it does is rapidly transfer data, like an ODBC, or JDBC driver and use any business intelligence tools you have on your desktop, or a nightly ETL.  However, I can see developers creating business intelligence tools on Android, iPad, or Windows tables to modify tables, create and view charts, etc., using custom BI tool sets and their data warehouse now becomes Google Docs.

Please reference an article I wrote a few days back, “Google is Going to be the Next Public and Private Data Warehouse“.

At that time, Google Fusion was marked as Beta on 10/13/2012.  Google has since stripped off the word Beta, but doesn’t matter.  Its even better with the Google API to Google Docs.  Google Fusion could be your starter User Interface, however, if your Android, iOS (Apple iPad), and Windows developers really embrace this API, all of the big database companies like IBM, Oracle, and Microsoft may have their market share eroded to some extent, if not a great extent.

Update 10/19:

Hey Gs (Guys and Gals), I forgot to mention, you can also make your own video or music streaming applications perhaps, using the basic calls of get and receive file other companies are already doing such as AWS, Box, etc. It’s a simple get / send API, so not sure if it’s applicable to ‘streaming’ at this stage, just another storage location in the ‘cloud’, which would be quite boring.  Although thinking of it now, aren’t all the put / send cloud solutions potential data warehouses using ETL and the APIs discussed and published above?  Also, it’s ironic that Google would also be competing with itself, if it was a file share, ‘stream’ videos, and YouTube?