Tag Archives: Big Data

Follow the Breadcrumbs: Identify and Transform

Trends – High Occurrence, Word Associations

Over the last two decades, I’ve been involved in several solutions that incorporated artificial intelligence and in some cases machine learning. I’ve understood at the architectural level, and in some cases, a deeper dive.

I’ve had the urge to perform a data trending exercise, where not only do we identify existing trends, similar to “out of the box” Twitter capabilities, we can also augment “the message” as trends unfold. Also, probably AI 101. However, I wanted to submerge myself in understanding this Data Science project. My Solution Statement: Given a list of my interests, we can derive sentence fragments from Twitter, traverse the tweet, parsing each word off as a possible “breadcrumb”. Then remove the Stop Words, and voila, words that can identify trends, and can be used to create/modify trends.

Finally, to give the breadcrumbs, and those “words of interest” greater depth, using the Oxford Dictionaries API we can enrich the data with things like their Thesaurus and Synonyms.

Gotta Have a Hobby

It’s been a while now that I’ve been hooked on Microsoft Power Automate, formerly known as Microsoft Flow. It’s relatively inexpensive and has the capabilities to be a tremendous resource for almost ANY project. There is a FREE version, and then the paid version is $15 per month. No brainer to pick the $15 tier with bonus data connectors.

I’ve had the opportunity to explore the platform and create workflows. Some fun examples, initially, using MS Flow, I parsed RSS feeds, and if a criterion was met, I’d get an email. I did the same with a Twitter feed. I then kicked it up a notch and inserted these records of interest into a database. The library of Templates and Connectors is staggering, and I suggest you take a look if you’re in a position where you need to collect and transform data, followed by a Load and a notification process.

What Problem are we Trying to Solve?

How are trends formed, how are they influenced, and what factors influence them? The most influential people providing input to a trend? Influential based on location? Does language play a factor on how trends are developed? End Goal: driving trends, and not just observing them.

Witches Brew – Experiment Ingredients:

Obtaining and Scrubbing Data

Articles I’ve read regarding Data Science projects revolved around 5 steps:

  1. Obtain Data
  2. Scrub Data
  3. Explore Data
  4. Model Data
  5. Interpreting Data

The rest of this post will mostly revolve around steps 1 and 2. Here is a great article that goes through each of the steps in more detail: 5 Steps of a Data Science Project Lifecycle

Capturing and Preparing the Data

The data set is arguably the most important aspect of Machine Learning. Not having a set of data that conforms to the bell curve and consists of all outliers will produce an inaccurate reflection of the present, and poor prediction of the future.

First, I created a table of search criteria based on topics that interest me.

Search Criteria List

Then I created a Microsoft Flow for each of the search criteria to capture tweets with the search text, and insert the results into a database table.

MS Flow - Twitter : Ingestion of Learning Tweets
MS Flow – Twitter: Ingestion of Learning Tweets

Out of the total 7450 tweets collected from all the search criteria, 548 tweets were from the Search Criteria “Learning” (22).

Data Ingestion - Twitter
Data Ingestion – Twitter

After you’ve obtained the data, you will need to parse the Tweet text into “breadcrumbs”, which “lead a path” to the Search Criteria.

Machine Learning and Structured Query Language (SQL)

This entire predictive trend analysis could be much easier with a more restrictive syntax language like SQL instead of English Tweets. Parsing SQL statements would be easier to make correlations. For example, the SQL structure can be represented such as: SELECT Col1, Col2 FROM TableA where Col2 = ‘ABC’. Based on the data set size, we may be able to extrapolate and correlate rows returned to provide valuable insights, e.g. projected impact performance of the query to the data warehouse.

R language and R Studio

Preparing Data Sets Using Tools Designed to Perform Data Science.

R language and R Studio seems to be very powerful when dealing with large data sets, and syntax makes it easy to “clean” the data set. However, I still prefer SQL Server and a decent query tool. Maybe my opinion will change over time. The most helpful thing I’ve seen from R studio is to create new data frames and the ability to rollback to a point in time, i.e. the previous version of the data set.

Changing column data type on the fly in R studio is also immensely valuable. For example, the data in the column are integers but the data table/column definition is a string or varchar. The user would have to drop the table in SQL DB, recreate the table with the new data type, and then reload the data. Not so with R.

People Turn Toward “Data Banks” to Commoditize on their Purchase and User Behavior Profiles

Anyone who is anti “Big Brother”, this may not be the article for you, in fact, skip it. 🙂

 

The Pendulum Swings Away from GDPR

In the not so distant future, “Data Bank” companies consisting of Subject Matter Experts (SME) across all verticals,  may process your data feeds collected from your purchase and user behavior profiles.  Consumers will be encouraged to submit their data profiles into a Data Bank who will offer incentives such as a reduction of insurance premiums to cash back rewards.

 

Everything from activity trackers, home automation, to vehicular automation data may be captured and aggregated.    The data collected can then be sliced and diced to provide macro and micro views of the information.    On the abstract, macro level the information may allow for demographic, statistical correlations, which may contribute to corporate strategy. On a granular view, the data will provide “data banks” the opportunity to sift through data to perform analysis and correlations that lead to actionable information.

 

Is it secure?  Do you care if a hacker steals your weight loss information? May not be an issue if collected Purchase and Use Behavior Profiles aggregate into a Blockchain general ledger.  Data Curators and Aggregators work with SMEs to correlate the data into:

  • Canned, ‘intelligent’ reports targeted for a specific subject matter, or across silos of data types
  • ‘Universes’ (i.e.  Business Objects) of data that may be ‘mined’ by consumer approved, ‘trusted’ third party companies, e.g. your insurance companies.
  • Actionable information based on AI subject matter rules engines and consumer rule transparency may be provided.

 

 “Data Banks” may be required to report to their customers who agreed to sell their data examples of specific rows of the data, which was sold on a “Data Market”.

Consumers may have the option of sharing their personal data with specific companies by proxy, through a ‘data bank’ granular to the data point collected.  Sharing of Purchase and User Behavior Profiles:

  1. may lower [or raise] your insurance premiums
  2. provide discounts on preventive health care products and services, e.g. vitamins to yoga classes
  3. Targeted, affordable,  medicine that may redirect the choice of the doctor to an alternate.  The MD would be contacted to validate the alternate.

 

The curriated data collected may be harnessed by thousands of affinity groups to offer very discrete products and services.  Purchase and User Behavior Profiles,  correlated information stretches beyond any consumer relationship experienced today.

 

At some point, health insurance companies may require you to wear a tracker to increase or slash premiums.  Auto Insurance companies may offer discounts for access to car smart data to make sure suggested maintenance guidelines for service are met.

 

You may approve your “data bank” to give access to specific soliciting government agencies or private firms looking to analyze data for their studies. You may qualify based on the demographic, abstracted data points collected for incentives provided may be tax credits, or paying studies.

Purchase and User Behavior Profiles:  Adoption and Affordability

If ‘Data Banks’ are allowed to collect Internet of Things (IoT) device profile and the devices themselves are cost prohibitive.  here are a few ways to increase their adoption:

  1.  [US] tax coupons to enable the buyer, at the time of purchase, to save money.  For example, a 100 USD discount applied at the time of purchase of an Activity Tracker, with the stipulation that you may agree,  at some point, to participate in a study.
  2. Government subsidies: the cost of aggregating and archiving Purchase and Behavioral profiles through annual tax deductions.  Today, tax incentives may allow you to purchase an IoT device if the cost is an itemized medical tax deduction, such as an Activity Tracker that monitors your heart rate, if your medical condition requires it.
  3. Auto, Life, Homeowners, and Health policyholders may qualify for additional insurance deductions
  4. Affinity branded IoT devices, such as American Lung Association may sell a logo branded Activity Tracker.  People may sponsor the owner of the tracking pedometer to raise funds for the cause.

The World Bank has a repository of data, World DataBank, which seems to store a large depth of information:

World Bank Open Data: free and open access to data about development in countries around the globe.”

Here is the article that inspired me to write this article:

http://www.marketwatch.com/story/you-might-be-wearing-a-health-tracker-at-work-one-day-2015-03-11

 

Privacy and Data Protection Creates Data Markets

Initiatives such as General Data Protection Regulation (GDPR) and other privacy initiatives which seek to constrict access to your data to you as the “owner”, as a byproduct, create opportunities for you to sell your data.  

 

Blockchain: Purchase, and User Behavior Profiles

As your “vault”, “Data Banks” will collect and maintain your two primary datasets:

  1. As a consumer of goods and services, a Purchase Profile is established and evolves over time.  Online purchases are automatically collected, curated, appended with metadata, and stored in a data vault [Blockchain].  “Offline” purchases at some point, may become a hybrid [on/off] line purchase, with advances in traditional monetary exchanges, and would follow the online transaction model.
  2. User Behavior (UB)  profiles, both on and offline will be collected and stored for analytical purposes.  A user behavior “session” is a use case of activity where YOU are the prime actor.  Each session would create a single UB transaction and are also stored in a “Data Vault”.   UB use cases may not lead to any purchases.

Not all Purchase and User Behavior profiles are created equal.  Eg. One person’s profile may show a monthly spend higher than another.  The consumer who purchases more may be entitled to more benefits.

These datasets wholly owned by the consumer, are safely stored, propagated, and immutable with a solution such as with a Blockchain general ledger.

FinTech: End to End Framework for Client, Intermediary, and Institutional Services

Is it all about being the most convenient,  payment processing partner, with an affinity to the payment processing brand?  It’s a good place to start; the Amazon Payments partner program.

FinTech noun : an economic industry composed of companies that use technology to make financial systems more efficient

Throughout my career, I’ve worked with several financial services  teams to engineer, test, and deploy solutions.  Here is a brief list of the FinTech solutions I helped construct, test,  and deploy:

  1. 3K Global Investment Bankers – proprietary CRM platform, including Business Analytics, Business Objects Universe.
  2. Equity Research platform, crafted based on business expertise.
    • Custom UI for research analysts, enabled the analysts to create their research, and push into the workflow.
    • Based on a set of rules,  ‘locked down’ part of the report would  “Build Discloses” , e.g. analyst holds 10% of co.
    • Custom Documentum workflow would route research to the distribution channels; or direct research to legal review.
  3. (Multiple Financial Org.) Data Warehouse middleware solutions to assist organizations in managing,  and monitoring usage of their DW.
  4. Global Derivatives firm, migration of mainframe system to C# client / Server platform
  5. Investment Bankers and Equity Capital Markets (ECMG)  build trading platform so teams may collaborate on Deals/Trades.
  6. Global Asset Management Firm: On boarding and Fund management solutions, custom UI and workflows in SharePoint

*****

A “Transaction Management Solution” targets a mixture of FinTech services, primarily “Payments” Processing.

Target State Capabilities of a Transaction Management Solution:

  1. Fraud Detection:  The ability to identify and prevent fraud exists within many levels of the transaction from facilitators of EFT to credit monitoring and scoring agencies.  Every touch point of a transaction has its own perspective of possible fraud, and must be evaluated to the extent it can be.
    • Business experts (SMEs)  and technologists continue to expand the practical applications of Artificial Intelligence (AI) every day.  Although extensive AI fraud detection applications  exists today incorporating human populated Rules Engines,  and AI Machine learning (independent rule creation).
  2. Consumer “Financial Insurance” Products
    • Observing a business, end to end transaction may provide visibility into areas of transaction risk.   Process  and/or technology may be adopted / augmented to minimize the risk.
      • E.g. eBay auction process has a risk regarding the changing hands of currency and merchandise.  A “delayed payment”, holding funds until the merchandise has been exchanged minimized the risk, implemented using PayPal.
    • In product lifecycle of Discovery, Development, and Delivery phases, converting concept to product.
  3. Transaction Data Usage for Analytics
    • Client initiating transaction,  intermediary parties, and destination of funds may all tell ‘a story’ about the transaction.
    • Every party within a transaction, beginning to end, may benefit from the use of the transaction data using analytics.
      • e.g. Quicken – personal finance management tool; collects, parses, and augments transaction data to provide client  analytics in the form of charts / graphs, and reports.
    • Clear, consistent, and comprehensive data set available at every point in the transaction lifecycle regardless of platform .
      • e.g. funds transferred between financial institutions may  have a descriptions that are not user friendly, or may not be actionable, e.g. cryptic name, and no contact details.
      • Normalizing data may occur at an abstracted layer
    • Abstracted, and aggregated data used for analytics
      • e.g. average car price given specs XYZ;
      • e.g. 2. avg. credit score in a particular zip code.
    • Continued growth opportunities, and challenges
      • e.g. data privacy v. allowable aggregated data
  4. Affinity Brand Opportunities Transaction Management Solution
    • eWallet affinity brand promotions,
      • e.g. based on transaction items’ rules; no shipping
      • e.g.2. “Cash Back” Rewards, and/or Market Points
      • e.g.3. Optional, “Fundraiser” options at time of purchase.
  5. Credit Umbrella: Monitoring Use Case
    • Transparency into newly, activated accounts enables the Transaction Management Solution (TMS) to trigger a rule to email the card holder, if eligible, to add card to eWallet

Is Intuit an acquisition target because of Quicken’s capabilities to provide users consistent reporting of transactions across all sources?  I just found this note in Wiki while writing this post:

Quicken is a personal finance management tool developed by Intuit, Inc. On March 3, 2016, Intuit announced plans to sell Quicken to H.I.G. Capital. Terms of the sale were not disclosed.[1]

For quite some time companies have attempted to tread in this space with mixed results, either through acquisition or build out of their existing platforms.  There seems to be significant opportunities within the services, software and infrastructure areas.  It will be interesting to see how it all plays out.

Inhibitors to enclosing a transaction within an end to end Transaction Management Solutions (TMS):

  • Higher level of risk (e.g. business, regulatory) expanding out service offerings
  • Stretching too thin, beyond core vision, and lose sight of vision.
  • Transforming tech  company to hybrid financial services
  • Automation, streamlining of processes, may derive efficiencies may lead to reduction in staff / workforce
  • Multiple platforms performing functions provides redundant capabilities, reduced risk, and more consumer choices

 Those inhibitors haven’t stopped these firms:

Payments Ecosystem
Payments Ecosystem

 

New PhD program pioneers ‘Big Data’ solutions

UDMessenger – Our Students – Big Data.

OUR STUDENTS | The University of Delaware’s new PhD program in Financial Services Analytics (FSAN) has gotten off to a strong start.

“I love how UD was able to merge finance, data mining, statistics and other areas to create the FSAN program,” said Leonardo De La Rosa Angarita, a current FSAN student. “This is also reflected in the diversity of the students. We come from different fields, and it is wonderful how we are able to complement each other in so many different ways.”The unique program, a collaborative effort between JPMorgan Chase & Co., the Alfred Lerner College of Business and Economics and the College of Engineering, teaches students to become experts at researching and analyzing large swaths of electronic information, known as “big data” in the business world.

Big Data
Big Data

Long Chen, another FSAN PhD student, said, “After I joined the program, I expected to learn about a broad range of disciplines to fill my toolkit, and that is exactly what we are doing—taking courses from the areas of finance, statistics and computer science. Although the program just started, I can tell we are heading in the right direction.”

Bintong Chen, Director of the FSAN program, said that students are trained as researchers and professionals who play key roles in interdisciplinary teams, applying their knowledge and skills to convert vast amounts of data into meaningful information for businesses and consumers.

Bintong Chen added that the first semester put students’ skills to the test, “due to the intensity and breadth of the core classes designed for the program, ranging from very technical subjects, such as machine learning and data mining, to very business-oriented topics about financial institutions.”

Students are interacting with JPMorgan’s Corporate and Investment Bank during the spring semester to identify topics for their research projects and potential summer internships.

“I always wondered what would happen if engineers and economists would speak the same language, if professors would be more open to the world outside the walls of their offices and if industry would get more interested in what we study in our classrooms,” said Eriselda Danaj, another student in the program. “It is challenging and I love it.”

Article by Sunny Rosen, AS14

Searching Big Data for ‘Digital Smoke Signals’ – NYTimes.com

Searching Big Data for ‘Digital Smoke Signals’ – NYTimes.com.

Excellent Article how the Public Sector is transforming by the private sector.  Good read.  Article frames out the group structure, and the team, but doesn’t go into the output of the statistics, i.e. group output, which is disappointing.  Gives you the sense the team is new, and is still coming to grips with what to output, whom to present it, and the advantages presented and opportunities taken as a result of the data. It could be this new, dynamic group within the United Nations is still trying to integrate with the rest of the organization, they are still wresting with the data, or the data draws dangerous conclusions that are not for public distribution.  Give the article a run through, and you will see the subtext is predicting world economies, and that is confidential to the people being analyzed, and also to the people who invest.

Google is Going to be the Next Public and Private Data Warehouse

In an article I wrote a while back, Google to venture into Cloud, provide Open Source APIs, assist small businesses to be Cloud Solutions Integrators, I was talking in the abstract, but I saw on the Google site, buried way down their menus, under the ‘More’, and then select the ‘Even More’ option, and at the bottom left of the page you will see Innovation, Fusion Tables (Beta).  Google is advanced, ready to compete with the database vendors, with a user friendly UI, better than I thought.  They are currently providing a way to upload data to a Google Drive, then the user imports the data from the Google Drive, and using table views  and Business Intelligence tools, allows the user to manipulate and share the data.  The data allowed to be uploaded into tables seems limitless. Although, they state Google is still in Beta, and publicly are showing users can upload and link to Google data instead of allowing users to connect to external data sources, such as your sales transaction database, there may be an API in the works for 3rd parties to allow for integration using direct connections through drivers such as ODBC or a JDBC driver to integrate with transactional systems to stream data and not just uploaded Google data.  However, this may be their strategy, to host all of the data, and have a migration utility.  At this stage, they would like to house the data and have the cloud storage infrastructure, however, the strategic mid-term goal may be to allow you to house your RDBMS transaction data locally, and we could stream, and/or upload into their data warehouse to apply Business Intelligence to manipulate the data, and then publish it in multiple formats, e.g. they would display the data for public or private consumption, and I can also see you are able to then publish charts with commentary into your Google Plus stream with specific ‘Circles’.  Brilliant.  Hat’s off to you guys.  If Google allows streaming of the data, or what we call data transformations from your e.g. sales transaction system to the Google data warehouse, then they would be competing with IBM, Oracle, and Microsoft.

Update: 12/26/12
After all of that profound scoping, and keen insight, I was chatted by a developer that Google’s BigQuery does the job better.  I am curious why it has not taken off in the Marketplace?  Anti-Trust?  Also, why then create an abstraction layer like these other products like Fusion and call out explicitly Google Docs, maybe that would help them transition into the market space with a different level of user the consumer, or the target user would be different, such as the small business.
[dfads params=’groups=1177,1178&limit=1&orderby=random’]

Big Data Creates Opportunities for Small to Midsize Retail Vendors

Big Data Creates Opportunities for Small to Midsize Retail Vendors through Collective Affinity Marketing outside Financial Institutions.

In the Harvard Business Review, there is an article, Will Big Data Kill All but the Biggest Retailers?  One idea to mitigate that risk is to create a collective of independent retailers under affinity programs, such as charities, and offer customers every N part of their purchase applies to the charity to reach specific goals as defined by the consumer.   Merchants, as part of this program, decide their own caps, or monetary participation levels.  Consumers belong to an affinity group, but it’s not limited to a particular credit card.  The key is this transaction data is available to all participating merchants for the affinity.  Transaction data spans all merchants within the affinity and not just the transactions executed with the merchant.

Using trusted, independent marketing data warehouses independent retail vendors share ‘big data’ to enable them to compete and utilize the same pool of consumer [habitual] spending data.

Affinity, marketing data companies can empower their retail clients/vendors with the tools for Business Intelligence and pull from the collection of consumer data.  Trusted, independent marketing data warehouses sprout up to collect consumer data and enable it’s retail vendor clients to mine the data.

These trusted loyalty affinity data warehouses, not affiliated with a single financial institution, as previously implemented with credit cards, but more in line with, or analogous to, supermarket style loyalty programs, however, all independent retail vendors may participate OR may cap these affinity program memberships for retail vendor from small to mid-size companies.

Note: Data obfuscation could be applied so customer identification on fields like social security number will not be transparent, limiting any liabilities for fraud.