Tag Archives: AI

Data Loss Prevention (DLP) for Structured Data Sources

When people think of Data Loss Prevention, we usually think of Endpoint protection, such as Symantec Endpoint Security solution, preventing the upload of data to web sites, or downloaded to a USB device. The data being “illegally” transferred typically conforms to a particular pattern such as Personal Identifiable Information (PII), i.e. Social Security numbers.

Using a client for local monitoring of the endpoint, the agent detects the transfer of information as a last line of defense for external distribution. EndPoint solutions could monitor suspicious activity and/or proactively cancel the data transfer in progress.

Moving closer to the source of the data loss, monitoring databases filled with Personal Identifying Information (PII) has its advantages and disadvantages. One may argue there is no data loss until the employee attempts to export the data outside the corporate network, and the data is in-flight. In addition, extracted PII data may be “properly utilized” within the corporate network for analysis.

There is a database solution that provides similar “endpoint” monitoring and protection, e.g. identifying PII data extraction, with real-time query cancellation upon detection, leveraging “out of the box” data patterns, Teleran Technologies. Teleran supports relational databases such as Oracle, and Microsoft SQL Server, both on-prem, and cloud solutions.

Updates in Data Management Policies

Identifying the data loss points of origination provides opportunities to update the gaps in data management policy and the implementation of additional controls over data. Data classification is done dynamically based on common data mask structures. Users may build additional rules to cover custom structures. So, for example, a business analyst executes a query against a database that appears to fit predefined data masks, such as SSN, the query may be canceled before it’s even executed, and/or this “suspicious” activity can be flagged for the Chief Information Officer and/or Chief Security Officer (CSO)

Bar none, I’ve seen only one firm that defends a company’s data assets closer to the probable leak of information, the database, Teleran Technologies, See what they have to offer your organization for data protection and compliance.

Prevalent Remote Work Changes Endpoint Strategy

Endpoints in our corporate environments of prevalent remote working may highlight the need that relying on endpoints may be too late to enforce data protection. We may need to bring potential data loss detection into the inner sanctum of the corporate networks and need prevention closer to the source of data being extracted. How are “semi-trusted” third parties such as staff augmentation from offshore dealt?

Endpoint DLP – Available Breach Tactics

Endpoint DLP may capture and contain attempts to extract PII data, for example, parsing text files for SSNs, or other data masks. However, there are ways around the transfer detection, making it lofty to identify, such as screen captures of data, converting from text into images. Some Endpoint providers boast about their Optical Character Recognition (OCR), however, turning on this feature may produce many false positives, too many to sift through in monitoring, and unmanageable to control. The best DLP defense is to monitor and control closer to the data source, and perhaps, flag data requests from employees, e.g. after SELECT statement entered, UI Pops up a “Reason for Request?” if PII extraction is identified in real-time, with auditable events that can flow into Splunk.

AR Sudoku Solver Uses Machine Learning To Solve Puzzles Instantly

Very novel concept, applying Augmented Reality and Artificial Intelligence (i.e. Machine Learning) to solving puzzles, such as Sudoko.  Maybe not so novel considering AR uses in manufacturing.

Next, we’ll be using similar technology for human to human negotiations, reading body language, understanding logical arguments, reading human emotion, and to rebut remarks in a debate.

Litigators watch out… Or, co-counsel?   Maybe a hand of Poker?

Source: AR Sudoku Solver Uses Machine Learning To Solve Puzzles Instantly

Cloud-native as the Future of Data Loss Prevention – Nightfall AI

An interesting approach to Data Loss Prevention (DLP)

Data loss prevention (DLP) is one of the most important tools that enterprises have to protect themselves from modern security threats like data exfiltration, data leakage, and other types of sensitive data and secrets exposure. Many organizations seem to understand this, with the DLP market expected to grow worldwide in the coming years. However, not all approaches to DLP are created equal. DLP solutions can vary in the scope of remediation options they provide as well as the security layers that they apply to. Traditionally, data loss prevention has been an on-premise or endpoint solution meant to enforce policies on devices connected over specific networks. As cloud adoption accelerates, though, the utility of these traditional approaches to DLP will substantially decrease.

Established data loss prevention solution providers have attempted to address these gaps with developments like endpoint DLP and cloud access security brokers (CASBs) which provide security teams with visibility of devices and programs running outside of their walls or sanctioned environments. While both solutions minimize security blind spots, at least relative to network layer and on-prem solutions, they can result in inconsistent enforcement. Endpoint DLPs, for example, do not provide visibility at the application layer, meaning that policy enforcement is limited to managing what programs and data are installed on a device. CASBs can be somewhat more sophisticated in determining what cloud applications are permissible on a device or network, but may still face similar shortfalls surrounding behavior and data within cloud applications.

Cloud adoption was expected to grow nearly 17% between 2019 and 2020; however, as more enterprises embrace cloud-first strategies for workforce management and business continuity during the COVID-19 pandemic, we’re likely to see even more aggressive cloud adoption. With more data in the cloud, the need for policy remediation and data visibility at the application layer will only increase and organizations will begin to seek cloud-native approaches to cloud security.

Source: Cloud-native as the Future of Data Loss Prevention – Nightfall AI

Going Solo – Gig to Gig

Having the Stamina to Last…

Going the consulting path, on your own, is no small feat. Do you have what it takes to persist, survive, and thrive?

  • Army of One – Not only do you need to perform your CONSULTANCY role, but you also have to be bookkeeper, sales and marketing, looking for new opportunities.
  • The Gap Between Gigs – To all recruiters and hiring managers – it’s not a bad thing to have gaps in a candidate’s resume. Its the way of life in our gig economy. We are constantly hunting for just the right opportunity in a sea of hundreds or thousands of candidates per role.
  • Keeping Up With Market Trends – Online learning platforms such as Pluralsight, keep their content fresh, relevant, and in line with your career path.
  • Networking, Networking, Networking – at every opportunity, build your network of contacts and keep them in the know

Follow the Breadcrumbs: Identify and Transform

Trends – High Occurrence, Word Associations

Over the last two decades, I’ve been involved in several solutions that incorporated artificial intelligence and in some cases machine learning. I’ve understood at the architectural level, and in some cases, a deeper dive.

I’ve had the urge to perform a data trending exercise, where not only do we identify existing trends, similar to “out of the box” Twitter capabilities, we can also augment “the message” as trends unfold. Also, probably AI 101. However, I wanted to submerge myself in understanding this Data Science project. My Solution Statement: Given a list of my interests, we can derive sentence fragments from Twitter, traverse the tweet, parsing each word off as a possible “breadcrumb”. Then remove the Stop Words, and voila, words that can identify trends, and can be used to create/modify trends.

Finally, to give the breadcrumbs, and those “words of interest” greater depth, using the Oxford Dictionaries API we can enrich the data with things like their Thesaurus and Synonyms.

Gotta Have a Hobby

It’s been a while now that I’ve been hooked on Microsoft Power Automate, formerly known as Microsoft Flow. It’s relatively inexpensive and has the capabilities to be a tremendous resource for almost ANY project. There is a FREE version, and then the paid version is $15 per month. No brainer to pick the $15 tier with bonus data connectors.

I’ve had the opportunity to explore the platform and create workflows. Some fun examples, initially, using MS Flow, I parsed RSS feeds, and if a criterion was met, I’d get an email. I did the same with a Twitter feed. I then kicked it up a notch and inserted these records of interest into a database. The library of Templates and Connectors is staggering, and I suggest you take a look if you’re in a position where you need to collect and transform data, followed by a Load and a notification process.

What Problem are we Trying to Solve?

How are trends formed, how are they influenced, and what factors influence them? The most influential people providing input to a trend? Influential based on location? Does language play a factor on how trends are developed? End Goal: driving trends, and not just observing them.

Witches Brew – Experiment Ingredients:

Obtaining and Scrubbing Data

Articles I’ve read regarding Data Science projects revolved around 5 steps:

  1. Obtain Data
  2. Scrub Data
  3. Explore Data
  4. Model Data
  5. Interpreting Data

The rest of this post will mostly revolve around steps 1 and 2. Here is a great article that goes through each of the steps in more detail: 5 Steps of a Data Science Project Lifecycle

Capturing and Preparing the Data

The data set is arguably the most important aspect of Machine Learning. Not having a set of data that conforms to the bell curve and consists of all outliers will produce an inaccurate reflection of the present, and poor prediction of the future.

First, I created a table of search criteria based on topics that interest me.

Search Criteria List

Then I created a Microsoft Flow for each of the search criteria to capture tweets with the search text, and insert the results into a database table.

MS Flow - Twitter : Ingestion of Learning Tweets
MS Flow – Twitter: Ingestion of Learning Tweets

Out of the total 7450 tweets collected from all the search criteria, 548 tweets were from the Search Criteria “Learning” (22).

Data Ingestion - Twitter
Data Ingestion – Twitter

After you’ve obtained the data, you will need to parse the Tweet text into “breadcrumbs”, which “lead a path” to the Search Criteria.

Machine Learning and Structured Query Language (SQL)

This entire predictive trend analysis could be much easier with a more restrictive syntax language like SQL instead of English Tweets. Parsing SQL statements would be easier to make correlations. For example, the SQL structure can be represented such as: SELECT Col1, Col2 FROM TableA where Col2 = ‘ABC’. Based on the data set size, we may be able to extrapolate and correlate rows returned to provide valuable insights, e.g. projected impact performance of the query to the data warehouse.

R language and R Studio

Preparing Data Sets Using Tools Designed to Perform Data Science.

R language and R Studio seems to be very powerful when dealing with large data sets, and syntax makes it easy to “clean” the data set. However, I still prefer SQL Server and a decent query tool. Maybe my opinion will change over time. The most helpful thing I’ve seen from R studio is to create new data frames and the ability to rollback to a point in time, i.e. the previous version of the data set.

Changing column data type on the fly in R studio is also immensely valuable. For example, the data in the column are integers but the data table/column definition is a string or varchar. The user would have to drop the table in SQL DB, recreate the table with the new data type, and then reload the data. Not so with R.

Email Composer: Persona Point of View (POV) Reviews

First, there was Spell Check, next Thesaurus, Synonyms, contextual grammar suggestions, and now Persona, Point of View Reviews. Between the immensely accurate and omnipresent #Grammarly and #Google’s #Gmail Predictive Text, I starting thinking about the next step in the AI and Human partnership on crafting communications.

Google Gmail Predictive Text

Google gMail predictive text had me thinking about AI possibilities within an email, and it occurred to me, I understand what I’m trying to communicate to my email recipients but do I really know how my message is being interpreted?

Google gMail has this eerily accurate auto suggestive capability, as you type out your email sentence gMail suggests the next word or words that you plan on typing. As you type auto suggestive sentence fragments appear to the right of the cursor. It’s like reading your mind. The most common word or words that are predicted to come next in the composer’s eMail.

Personas

In the software development world, it’s a categorization or grouping of people that may play a similar role, behave in a consistent fashion. For example, we may have a lifecycle of parking meters, where the primary goal is the collection of parking fees. In this case, personas may include “meter attendant”, and “the consumer”. These two personas have different goals, and how they behave can be categorized. There are many such roles within and outside a business context.

In many software development tools that enable people to collect and track user stories or requirements, the tools also allow you to define and correlate personas with user stories.

As in the case of email composition, once the email has been written, the composer may choose to select a category of people they would like to “view from their perspective”. Can the email application define categories of recipients, and then preview these emails from their perspective viewpoints?

What will the selected persona derive from the words arranged in a particular order? What meaning will they attribute to the email?

Use Personas in the formulation of user stories/requirements; understand how Personas will react to “the system”, and changes to the system.

Finally the use of the [email composer] solution based on “actors” or “personas”. What personas’ are “out of the box”? What personas will need to be derived by the email composer’s setup of these categories of people? Wizard-based Persona definitions?

There are already software development tools like Azure DevOps (ADO), which empower teams to manage product backlogs and correlate “User Stories”, or “Product Backlog Items” with Personas. These are static personas, that are completely user-defined, and no intelligence to correlate “user stories” with personas”. Users of ADO must create these links.

Now, technology can assist us to consider the intended audience, a systematic, biased perspective using Artificial Intelligence to inspect your email based on selected “point of view” (a Persons) of the intended email. Maybe your email will be misconstrued as abrasive, and not the intended response.

Deep Learning vs Machine Learning – Overview & Differences – Morioh

Machine learning and deep learning are two subsets of artificial intelligence which have garnered a lot of attention over the past two years. If you’re here looking to understand both the terms in the simplest way possible, there’s no better place to be..
— Read on morioh.com/p/78e1357f65b0

Help Wanted: Civil War Reenactment Soldiers to Improve AI Models

I just read an article on Digital PC Magazine, “Human Help Wanted: Why AI Is Terrible at Content Moderation” which started to get my neurons firing.

Problem Statement

Every day, Facebook’s artificial intelligence algorithms tackle the enormous task of finding and removing millions of posts containing spam, hate speech, nudity, violence, and terrorist propaganda. And though the company has access to some of the world’s most coveted talent and technology, it’s struggling to find and remove toxic content fast enough.

Ben Dickson
July 10, 2019 1:36PM EST

I’ve worked at several software companies which leveraged Artifical Intelligence, Machine Learning to recognize patterns, correlations. The larger the data sets, in general, the higher the accuracy of the predictions. The outliers in the data, the noise, “falls out” of the data set. Without quality, large training data, Artificial Intelligence makes more mistakes.

In terms of speech recognition, image classification, and natural language processing (NLP), in general, programs like chatbots, digital assistants, are becoming more accurate because of their sample size, training data sets are large, and there is no shortage of these data types. For example, there are many ways I can ask my digital assistant for something, like “Get the movie times”. Training a digital assistant, at a high level, would be to catalog how many ways can I ask for “something”, achieve my goal. I can go and create that list. I could write a few dozen questions, but still, my sample data set would be too small. Amazon has a crowdsourcing platform, Amazon Mechanical Turk, which I can request they build me the data sets, thousands of questions, and correlated goals.

MTurk enables companies to harness the collective intelligence, skills, and insights from a global workforce to streamline business processes, augment data collection and analysis, and accelerate machine learning development.

Amazon Mechanical Turk: Access a global, on-demand, 24×7 workforce

Video “Scene” Recognition – Annotated Data Sets for a Wide Variety of Scene Themes

In silent films, the plot was conveyed by the use of title cards, written indications of the plot and key dialogue lines. Unfortunately, silent films are not making a comeback. In order to achieve a high rate of successful identification of activities within a given video clip, video libraries of metadata need to be created, that capture:

  • Media / Video Asset, Unique Identifier
  • Scene Clip IN and OUT timecodes
  • Scene Theme(s), similar to Natural language processing (NLP), Goals = Utterances / Sentences
    • E.g. Man drinking water; Woman playing Tennis
  • Image recognition, in the context of machine vision, is the ability of software to identify objects, places, people, writing and actions in images. Image recognition is used to perform a large number of machine-based visual tasks, such as labeling the content of images with meta-tags

Not Enough Data

Here is an example of how Social Media, such as Facebook, attempts to deal with video deemed inappropriate for their platform:

In March, a shooter in New Zealand live-streamed the brutal killing of 51 people in two mosques on Facebook. But the social-media giant’s algorithms failed to detect the gruesome video. It took Facebook an hour to take the video down, and even then, the company was hard-pressed to deal with users who reposted the video.

Ben Dickson
July 10, 2019 1:36PM EST

…in many cases, such as violent content, there aren’t enough examples to train a reliable AI model. “Thankfully, we don’t have a lot of examples of real people shooting other people,” Yann LeCun, Facebook’s chief artificial-intelligence scientist, told Bloomberg.

Ben Dickson
July 10, 2019 1:36PM EST

Opportunities for Actors and Curators of Video Content: Dramatizations

All those thousands of people who perform, creating videos of content that range the gamut from playing video games to “unboxing” collectible items. The actors who perform dramatizations could add tags to their videos indicating as per above, documenting themes for a given skit. If actors post their videos on YouTube or proprietary crowdsourcing platforms, they would be entitled to some revenue for the use of their licensed video.

Disclosure Regarding Flag Controversy

I now realize there are politics around Nike “tipping their hat” toward the Betsy Ross flag. However, when I referenced the flag in this blog post, I was thinking of the American Revolution, and the 13 colonies flag. I didn’t think the title would resonate with readers, “Help Wanted: Amerian Revolutionary war Reenactment Soldiers to Improve AI Models.”, so I took some creative liberty.