Apache NiFi on Hortonworks HDF Verses … Microsoft Flow?

Attended a technical discussion last night on Apache NiFi and Hortonworks HDF,  a Meetup @ Honeywell, a Hortonworks client.

Excellent presentations from the Hortonworks team for “NiFi on HDF” solutions architecture and best practices. Powerful solution to process and distribute data in real-time, any data, and in large quantities with resiliency.   It’s no wonder why the US NSA originally developed the ability to consume data in real-time, manipulate it, and then send it on it’s way.  However, recognizing the commercial applications (benevolent wisdom?), the NSA released the product as open-source software, via its technology transfer program.

As a tangent,  among other things, I’m currently exploring the capabilities of “Microsoft Flow“, which has recently been promoted to GA from their ‘Preview Release’.  One resonating question came to mind during the presentations last night:

At it’s peak maturity (not yet), can Microsoft Flow successfully compete with Apache NiFi on Hortonworks HDF?

Discussion Points:

  • The NiFi / HDF solution manages data flows in real-time.  The Microsoft Flow architecture seems to fall short in this capacity. Is it on the product road map for Flow?  Is it a capability Microsoft wants to have?
  • There a bit of architecture / infrastructure on the Hortonworks HDF side, which enables the solution as a whole to be able to ingest, process, and push the data in real-time.   Not sure Microsoft Flow is currently engineered on the back end to handle the throughput.
  • The current Microsoft Flow UI may need to be updated to handle this ‘slightly altered’ paradigm of real-time content consumption and distribution.

The comparison between Microsoft Flow and NiFi on HDF may be a huge stretch for comparison.

Cloud Serverless Computing: Why? and With Whom?

What is Cloud Serverless Computing?

Based on your application Use Case(s), Cloud Serverless Computing architecture may reduce ongoing costs for application usage, and provide scalability on demand without the Cloud Server Instance management overhead, i.e. costs and effort.
Note: Cloud Serverless Computing is used interchangeability with Functions as a service (FaaS) which makes sense from a developer’s standpoint as they are coding Functions (or Methods), and that’s the level of abstraction.

Microsoft Flow

 

Microsoft Flow Pricing

As listed below, there are three tiers, which includes a free tier for personal use or exploring the platform for your business.  The pay Flow plans seem ridiculously inexpensive based on what business workflow designers receive for the 5 USD or 15 USD per month.  Microsoft Flow has abstracted building workflows so almost anyone can build application workflows or automate business manual workflows leveraging almost any of the popular applications on the market.

It doesn’t seem like 3rd party [data] Connectors and Template creators receive any direct monetary value from the Microsoft Flow platform.  Although workflow designers and business owners may be swayed to purchase 3rd party product licenses for the use of their core technology.

Microsoft Flow Pricing
Microsoft Flow Pricing

Microsoft Azure Functions

Process events with a serverless code architecture.  An event-based serverless compute experience to accelerate development. Scale based on demand and pay only for the resources you consume.

Google Cloud  Serverless

Properly designed microservices have a single responsibility and can independently scale. With traditional applications being broken up into 100s of microservices, traditional platform technologies can lead to significant increase in management and infrastructure costs. Google Cloud Platform’s serverless products mitigates these challenges and help you create cost-effective microservices.

Google Serverless Application Development
Google Serverless Application Development

 

Google Serverless Analytics and Machine Learning
Google Serverless Analytics and Machine Learning

 

Google Serverless Use Cases
Google Serverless Use Cases

 

Amazon AWS  Lambda

AWS provides a set of fully managed services that you can use to build and run serverless applications. You use these services to build serverless applications that don’t require provisioning, maintaining, and administering servers for backend components such as compute, databases, storage, stream processing, message queueing, and more. You also no longer need to worry about ensuring application fault tolerance and availability. Instead, AWS handles all of these capabilities for you, allowing you to focus on product innovation and get faster time-to-market. It’s important to note that Amazon was the first contender in this space with a 2014 product launch.

IBM Bluemix OpenWhisk

Execute code on demand in a highly scalable serverless environment.  Create and run event-driven apps that scale on demand.

  • Focus on essential event-driven logic, not on maintaining servers
  • Integrate with a catalog of services
  • Pay for actual usage rather than projected peaks

The OpenWhisk serverless architecture accelerates development as a set of small, distinct, and independent actions. By abstracting away infrastructure, OpenWhisk frees members of small teams to rapidly work on different pieces of code simultaneously, keeping the overall focus on creating user experiences customers want.

What’s Next?

Serverless Computing is a decision that needs to be made based on the usage profile of your application.  For the right use case, serverless computing is an excellent choice that is ready for prime time and can provide significant cost savings.

There’s an excellent article, recently published July 16th, 2017 by  Moshe Kranc called, “Serverless Computing: Ready for Prime Time” which at a high level can help you determine if your application is a candidate for Serverless Computing.


See Also:
  1. “Serverless computing architecture, microservices boost cloud outlook” by Mike Pfeiffer
  2. “What is serverless computing? A primer from the DevOps point of view” by J Steven Perry

Applying Artificial Intelligence & Machine Learning to Data Warehousing

Protecting the Data Warehouse with Artificial Intelligence

Teleran is a middleware company who’s software monitors and governs OLAP activity between the Data Warehouse and Business Intelligence tools, like Business Objects and Cognos.   Teleran’s suite of tools encompass a comprehensive analytical and monitoring solution called iSight.  In addition, Teleran has a product that leverages artificial intelligence and machine learning to impose real-time query and data access controls.  Architecture  also allows for Teleran’s agent not to be on the same host as the database, for additional security and prevention of utilizing resources from the database host.

Key Features of iGuard:
  • Policy engine prevents “bad” queries before reaching database
  • Patented rule engine resides in-memory to evaluate queries at database protocol layer on TCP/IP network
  • Patented rule engine prevents inappropriate or long-running queries from reaching the data
70 Customizable Policy Templates
SQL Query Policies
  • Create policies using policy templates based on SQL Syntax:
    • Require JOIN to Security Table
    • Column Combination Restriction –  Ex. Prevents combining customer name and social security #
    • Table JOIN restriction –  Ex. Prevents joining two different tables in same query
    • Equi-literal Compare requirement – Tightly Constrains Query Ex. Prevents hunting for sensitive data by requiring ‘=‘ condition
    • DDL/DCL restrictions (Create, Alter, Drop, Grant)
    • DQL/DML restrictions (Select, Insert, Update, Delete)
Data Access Policies

Blocks access to sensitive database objects

  • By user or user groups and time of day (shift) (e.g. ETL)
    • Schemas
    • Tables/Views
    • Columns
    • Rows
    • Stored Procs/Functions
    • Packages (Oracle)
Connection Policies

Blocks connections to the database

  • White list or black list by
    • DB User Logins
    • OS User Logins
    • Applications (BI, Query Apps)
    • IP addresses
Rule Templates Contain Customizable Messages

Each of the “Policy Templates”  has the ability to send the user querying the database a customized message based on the defined policy. The message back to the user from Teleran should be seamless to the application user’s experience.

iGuard Rules Messaging
iGuard Rules Messaging

 

Machine Learning: Curbing Inappropriate, or Long Running Queries

iGuard has the ability to analyze all of the historical SQL passed through to the Data Warehouse, and suggest new, customized policies to cancel queries with certain SQL characteristics.   The Teleran administrator sets parameters such as rows or bytes returned, and then runs the induction process.  New rules will be suggested which exceed these defined parameters.  The induction engine is “smart” enough to look at the repository of queries holistically and not make determinations based on a single query.

Finally, here is a high level overview of the implementation architecture of iGuard.  For sales or pre-sales technical questions, please contact www.teleran.com

Teleran Logical Architecture
Teleran Logical Architecture

 

Currently Featured Clients
Teleran Featured Clients
Teleran Featured Clients

 

Google Search Enables Users to Upload Images for Searching with Visual Recognition. Yahoo and Bing…Not Yet

The ultimate goal, in my mind, is to have the capability within a Search Engine to be able to upload an image, then the search engine analyzes the image, and finds comparable images within some degree of variation, as dictated in the search properties.  The search engine may also derive metadata from the uploaded image such as attributes specific to the image object(s) types.  For example,  determine if a person [object] is “Joyful” or “Angry”.

As of the writing of this article,  search engines Yahoo and Microsoft Bing do not have the capability to upload an image and perform image/pattern recognition, and return results.   Behold, Google’s search engine has the ability to use some type of pattern matching, and find instances of your image across the world wide web.    From the Google Search “home page”, select “Images”, or after a text search, select the “Images” menu item.  From there, an additional icon appears, a camera with the hint text “Search by Image”.  Select the Camera icon, and you are presented with options on how Google can acquire your image, e.g. upload, or an image URL.

Google Search Upload Images
Google Search Upload Images

Select the “Upload an Image” tab, choose a file, and upload.  I used a fictional character, Max Headroom.   The search results were very good (see below).   I also attempted an uncommon shape, and it did not meet my expectations.   The poor performance of matching this possibly “unique” shape is mostly likely due to how the Google Image Classifier Model was defined, and correlating training data that tested the classifier model.  If the shape is “Unique” the Google Search Image Engine did it’s job.

Google Image Search Results – Max Headroom
Max Headroom Google Search Results
Max Headroom Google Search Results

 

Google Image Search Results – Odd Shaped Metal Object
Google Search Results - Odd Shaped Metal Object
Google Search Results – Odd Shaped Metal Object

The Google Search Image Engine was able to “Classify” the image as “metal”, so that’s good.  However I would have liked to see better matches under the “Visually Similar Image” section.  Again, this is probably due to the image classification process, and potentially the diversity of image samples.

A Few Questions for Google

How often is the Classifier Modeling process executed (i.e. training the classifier), and the model tested?  How are new images incorporated into the Classifier model?  Are the user uploaded images now included in the Model (after model training is run again)?    Is Google Search Image incorporating ALL Internet images into Classifier Model(s)?  Is an alternate AI Image Recognition process used beyond Classifier Models?

Behind the Scenes

In addition, Google has provided a Cloud Vision API as part of their Google Cloud Platform.

I’m not sure if the Cloud Vision API uses the same technology as Google’s Search Image Engine, but it’s worth noting.  After reaching the Cloud Vision API starting page, go to the “Try the API” section, and upload your image.  I tried a number of samples, including my odd shaped metal, and I uploaded the image.  I think it performed fairly well on the “labels” (i.e. image attributes)

Odd Shaped Metal Sample Image
Odd Shaped Metal Sample Image

Using the Google Cloud Vision API, to determine if there were any WEB matches with my odd shaped metal object, the search came up with no results.  In contrast, using Google’s Search Image Engine produced some “similar” web results.

Odd Shaped Metal Sample Image Web Results
Odd Shaped Metal Sample Image Web Results

Finally, I tested the Google Cloud Vision API with a self portrait image.  THIS was so cool.

Google Vision API - Face Attributes
Google Vision API – Face Attributes

The API brought back several image attributes specific to “Faces”.  It attempts to identify certain complex facial attributes, things like emotions, e.g. Joy, and Sorrow.

Google Vision API - Labels
Google Vision API – Labels

The API brought back the “Standard” set of Labels which show how the Classifier identified this image as a “Person”, such as Forehead and Chin.

Google Vision API - Web
Google Vision API – Web

Finally, the Google Cloud Vision API brought back the Web references, things like it identified me as a Project Manager, and an obscure reference to Zurg in my Twitter Bio.

The Google Cloud Vision API, and their own baked in Google Search Image Engine are extremely enticing, but yet have a ways to go in terms of accuracy %.  Of course,  I tried using my face in the Google Search Image Engine, and looking at the “Visually Similar Images” didn’t retrieve any images of me, or even a distant cousin (maybe?)

Google Image Search Engine: Ian Face Image
Google Image Search Engine: Ian Face Image

 

Smartphone AI Digital Assistant Encroaching on the Virtual Receptionist

Businesses already exist which have developed and sell Virtual Receptionist , that handle many caller needs (e.g. call routing).

However, AI Digital Assistants such as Alexa, Cortana, Google Now, and Siri have an opportunity to stretch their capabilities even further.  Leveraging technologies such as Natural language processing (NLP) and Speech recognition (SR), as well as APIs into the Smartphone’s OS answer/calling capabilities, functionality can be expanded to include:

  • Call Screening –  The digital assistant asks for the name of the caller,  purpose of the call, and if the matter is “Urgent
    • A generic “purpose” response, or a list of caller purpose items can be supplied to the caller, e.g. 1) Schedule an Appointment
    • The smartphone’s user would receive the caller’s name, and the purpose as a message back to the UI from the call, currently in a ‘hold’ state,
    • The smartphone user may decide to accept the call, or reject the call and send the caller to voice mail.
  • Call / Digital Assistant Capabilities
    • The digital assistant may schedule a ‘tentative’ appointment within the user’s calendar.  The caller may ask to schedule a meeting, the digital assistant would access the user’s  calendar to determine availability.  If calendar indicates availability, a ‘tentative’ meeting will be entered.  The smartphone user would have a list of tasks from the assistant, and one of the tasks is to ‘affirm’ availability of the meetings scheduled.
    • Allow recall of ‘generally available’ information.  If a caller would like to know the address of the smartphone user’s office, the Digital Assistant may access a database of generally available information, and provide it.  The Smartphone user may use applications like Google Keep, and any note tagged with a label “Open Access” may be accessible to any caller.
    • Join the smartphone user’s social network, such as LinkedIn. If the caller knows the phone number of the person, but is unable to find the user through the social network directory, an invite may be requested by the caller.
    • Custom business workflows may also be triggered through the smartphone, such as “Pay by Phone”.

Small Business Innovation Research (SBIR) Grants Still Open Thru 2017

Entrepreneurs / Science Guys (and Gals),

Are you ready for a challenge, and 150,000 USD to begin to pursue your challenge?

That’s just SBIR Phase I, Concept Development (~6 months).  The second phase, Prototype Development, may be funded up to 1 MM USD, and last 24 months.

The Small Business Innovation Research (SBIR) program is a highly competitive program that encourages domestic small businesses to engage in Federal Research/Research and Development (R/R&D) that has the potential for commercialization. Through a competitive awards-based program, SBIR enables small businesses to explore their technological potential and provides the incentive to profit from its commercialization. By including qualified small businesses in the nation’s R&D arena, high-tech innovation is stimulated and the United States gains entrepreneurial spirit as it meets its specific research and development needs.

The program’s goals are four-fold:
  1. Stimulate technological innovation.
  2. Meet Federal research and development needs.
  3. Foster and encourage participation in innovation and entrepreneurship by socially and economically disadvantaged persons.
  4. Increase private-sector commercialization of innovations derived from Federal research and development funding.

For more information on the program, please click here to download the latest SBIR Overview, which should have everything you need to know about the initiative.

Time is quickly running out to 1) Pick one of the Solicitation Topics provided by the US government; and 2) Submit your Proposal

For my query of the SBIR database of topics up for Contracts and Grants:  Phase I; Program = SBIR; Year = 2017

From that query, it produced 18 Contract / Grant opportunities.  Here are a few I thought would be interesting:

PAS-17-022
PAS-17-022
PAR-17-108
PAR-17-108
RFA-ES-17-004
RFA-ES-17-004
RFA-DA-17-010
RFA-DA-17-010

Click Here for the current, complete list of topics by the SBIR.  

 

Autonomous Software Layer for Vehicles through 3rd Party Integrators / Vendors

It seems that car manufacturers, among others, are building autonomous hardware (i.e. vehicle and other sensors) as well as the software to govern their usage.  Few companies are separating the hardware and software layers to explicitly carve out the autonomous software, for example.

Yes, there are benefits to tightly couple the autonomous hardware and software:

1. Proprietary implementations and intellectual property – Implementing autonomous vehicles within a single corporate entity may ‘fast track’ patents, and mitigate NDA challenges / risks

2. Synergies with two (or more) teams working in unison to implement functional goals.  However, this may also be accomplished through two organizations with tightly coupled teams.   Engaged, strong team leadership to help eliminate corp to corp BLOCKERS, must be in place to ensure deliverables.

There are also advantages with two separate organizations, one the software layer, and the other, the vehicle hardware implementation, i.e. sensors

1. Implementation of Autonomous Vehicle Hardware from AI Software enables multiple, strong alternate corporate perspectives These perspectives allow for a stronger, yet balanced approach to implementation.

2.  The AI Software for Autonomous vehicles, if contractually allowed, may work with multiple brand vehicles, implementing similar capabilities.  Vehicles now have capabilities / innovations shared across the car industry.  The AI Software may even become a standard in implementing Autonomous vehicles across the industry.

3. Working with multiple hardware / vehicle manufactures may allow the enablement of Software APIs, layer of implementation abstraction.  These APIs may enable similar approaches to implementation, and reduce redundancy and work can be used as ‘the gold standard’ in the industry.

4. We see commercial adoption of autonomous vehicle features such as “Auto Lane Change”, and “Automatic Emergency Braking.” so it makes sense to adopt standards through 3rd Party AI software Integrators / Vendors

5. Incorporating Checks and Balances to instill quality into the product and the process that governs it.

In summation, Car parts are typically not built in one geographic location, but through a global collaboration.  Autonomous software for vehicles should be externalized in order to overcome unbiased safety and security requirements.  A standards organization “with teeth” could orchestrate input from the industry, and collectively devise “best practices” for autonomous vehicles.

Amazon X-Ray Studios for Indie Movie Producers

I remember building a companion app for the Windows desktop that pulled music data from iTunes and Gracenote.   Gracenote boasts:

“Gracenote technology is at the heart of every great entertainment experience, and is supported by the largest source of music metadata on the planet..”

Gracenote, in conjunction with the iTunes API / data allowed me to personalize the user experience beyond what iTunes provided out of the box.   X-Ray IMDb on Amazon Video also enriches the experience of watching movies and television hosted on Amazon Video .

While watching a movie using Amazon Video, you can tap the screen, and get details about the specific scene, shown in the foreground as the media continues to play.

“Go behind the scenes of your favorite movies and TV shows with X-Ray, powered by IMDb.  Get instant access to cast photos, bios, and filmographies, soundtrack info, and trivia.  “

IMDb is an Amazon company, which in his infinite foresight, in 1998, Jeff Bezos, founder, owner and CEO of Amazon.com, struck a deal to buy IMDb outright for approximately $55 million and attach it to Amazon as a subsidiary, private company.

The Internet Movie Database (abbreviated IMDb) is an online database of information related to films, television programs and video games, including cast, production crew, fictional characters, biographies, plot summaries, trivia and reviews, operated by IMDb.com, Inc., a subsidiary of Amazon. As of June 2017, IMDb has approximately 4.4 million titles (including episodes), 8 million personalities in its database,[2] as well as 75 million registered users.


In Amazon’s infinite wisdom again, they are looking to stretch both X-Ray and the IMDb property to budding film artists looking to cultivate and mature their following.

Approach to Adoption of X-Ray IMDb  / Amazon Video

Amazon must empower artists and their representatives to update IMDb.  IMDBPro seems to enable just such capabilities such as:

Showcase yourself on IMDb & Amazon

Manage your photos and the credits you are Known For on IMDbPro, IMDb, and Amazon Video”
  1. How then is new media content, such as Actor’s photos, and Filmography [approved] and updated by IMDb.
  2. Furthermore, what is the selection process to get indie content [approved] and posted to Amazon video.  Is there a curation process whereby not every indie artist is hosted, e.g. creative selection process is driven by Amazon Video business.
  3. To expand the use of X-Ray powered by IMDb, what are the  options for alternate Media Players and Streamers?  e.g. is YouTube a possibility, hosting and streaming content embedded with X-Ray capabilities?  Does Amazon X-Ray enabled capabilities require the Amazon Video player?
X-Ray Current Support: Amazon Hosted and Streaming

X-Ray is available on the Amazon Video app in the US, UK, Germany, and Austria for thousands of titles on compatible devices including Amazon Fire Tablets and TV/Stick, iOS and Android mobile devices, and the web.    To access X-Ray, tap the screen or click on the Fire TV remote while the video is playing.”

Amazon X-Ray Studios, Video Editing/Integration Desktop Application

Indie producers may leverage X-Ray Studios to integrate IMDb overlay content to enhance their audience’s experience.   Timecodes are leveraged to sync up X-Ray content with the video content.

“In video production and filmmakingSMPTE timecode is used extensively for synchronization, and for logging and identifying material in recorded media. During filmmaking or video production shoot, the camera assistant will typically log the start and end timecodes of shots, and the data generated will be sent on to the editorial department for use in referencing those shots.”

All metadata regarding an Indie Video may be integrated into the video source / target file, and transcoding may be required to output the Amazon required media standard.

Amazon has slightly complicated the situation by creating an Amazon Web Service (AWS) called X-Ray which has completely no relation to the X-Ray service powered by IMDb.

Amazon could not be reached for comment.

Accelerating Availability of EV Charging Stations through Legislation

Are you planning on traveling this week for your vacation?  Bought a gas guzzler instead of an EV because of the time it takes to charge, and chargers were not readily available on your commute or a road trip?  Don’t want to get stuck without a charge, or kill a half day recharging your wheels?  It looks like with the release of the Tesla Model 3 it will address most of these items such as:

  • Range Per Charge – 215 miles
  • Supercharging
    • Tesla Supercharger provides up to 170 miles of range in as little as 30 minutes.
  • Relatively Affordable – 35k

But how wide spread are the charging stations, and what could be done as a catalyst to rapidly improve EV charging coverage?   We can look at EV charging coverage, US map like we do for cellular coverage (and range).  One possibility is to tack on a rider to federal funding requiring states that use federal funding for their highways implement EV Charging Stations at EVERY rest stop along the road.  

However, there are a vast array of EV Charging Stations on the market.   Although adoption across the US is currently sporadic, there are several brands/standards of EV Charging Mechanisms

  • A heterogenous mix of N EV chargers per rest stop / station, may allow for a diverse set of solutions.
  • Each state may declare N companies to install, maintain, and periodic upgrades for each EV Charging Station.  Each rest stop / station can be awarded to separate contractors, or clustered
  • The ‘bid winning’ companies per state responsible for the rest station EV chargers must perform ongoing evaluations of EV charging station adoption, and use quantifiable data for upgrading solutions, after N period of time.
Stakeholders to Gain from Expansion of EV Charging Stations across the United States of America.
  • The Consumer
  • EV Manufacturing Car Companies (Ford, General Motors, Nissan, Tesla, Toyota, etc.) Click here for the complete list.
  • Car Rental Firms
    • Is the cost of maintaining an EV vehicle (e.g. batteries) more expensive than the combustible engine?
  • Retail / shops across the US; i.e. at least 1/2 hour to charge
Who is in a position to accelerate / fast track this program?

Finally, is there really a “lack” of EV Charging Stations across the United States?   Here is a map from the U.S. Department of Energy National Renewable Energy Laboratory’s (NREL) list of charging stations.

I found several EV Charging Stations in my area such as  the Molly Pitcher Service Area, Mile Marker 71.7 New Jersey Turnpike, Cranbury, NJ 08512

Uncommon Opportunity? R&D Conversational AI Engineer

I had to share this opportunity.  The Conversational AI Engineer role will continue to be in demand for some time.


Title: R&D Conversational AI Engineer
Location: Englewood Cliffs, NJ
Duration: 6+ months Contract(with Possible extension)

Responsibilities:

  • Create Alexa Skills, Google Home Actions, and chatbots for various direct Client’s brands and initiatives.
  • Work with the Digital Enterprises group to create production-ready conversational agents to help Client emerge in the connected life space.
  • Create additional add-ons to the conversational agents
  • Work with new technologies not be fully documented yet
  • Work with startups and their technology emerging in the connected life space.

Quals–
Client is looking for a developer in conversational AI and bot development.

What is Media Labs?   Media Labs is dedicated to driving a collaborative culture of innovation across all of Clients . We serve as an internal incubator and accelerator for emerging technology and are leading the way with fresh ideas to ignite the future of media and storytelling. We are committed to partnering with another telecom giant, startups, research and academic groups, content creators and brands to further innovation at client. One of our main themes is connected life and we are looking for an engineer to lead this development.

Requirements for R&D Engineer: –

  • Bachelor in Computer Science, Engineering, or other related field
  • Experience working with new technologies that may not be fully documented yet
  • Experience communicating technology to non-technical people
  • Experience with AWS (Lambda, CloudWatch, S3, API Gateway, etc)
  • Experience with JavaScript, Node.js
  • Some experience creating Alexa Skills, Google Home Actions, or chatbots

Optional Requirements:

  • Experience creating iOS or Android applications (native or non-native)
  •  Experience with API.AI or another NLP engine (Lex, Watson Conversation)

Smart Solutions

Skip to toolbar