This morning my kids woke up to the new “snow day”, an internet outage of SaaS Education products, i.e. Google Meet. Since we are primarily virtual with education these days, the term “Snow Day” at least for this Winter 2020-2021 season will not be an issue. No dangerous roads for busses to navigate. They have been replaced by tech outages. Although few in number, and limited duration, thus far, kids may be happy that they can miss their first period in school.
First, there was Spell Check, next Thesaurus, Synonyms, contextual grammar suggestions, and now Persona, Point of View Reviews. Between the immensely accurate and omnipresent #Grammarly and #Google’s #Gmail Predictive Text, I starting thinking about the next step in the AI and Human partnership on crafting communications.
Google Gmail Predictive Text
Google gMail predictive text had me thinking about AI possibilities within an email, and it occurred to me, I understand what I’m trying to communicate to my email recipients but do I really know how my message is being interpreted?
Google gMail has this eerily accurate auto suggestive capability, as you type out your email sentence gMail suggests the next word or words that you plan on typing. As you type auto suggestive sentence fragments appear to the right of the cursor. It’s like reading your mind. The most common word or words that are predicted to come next in the composer’s eMail.
In the software development world, it’s a categorization or grouping of people that may play a similar role, behave in a consistent fashion. For example, we may have a lifecycle of parking meters, where the primary goal is the collection of parking fees. In this case, personas may include “meter attendant”, and “the consumer”. These two personas have different goals, and how they behave can be categorized. There are many such roles within and outside a business context.
In many software development tools that enable people to collect and track user stories or requirements, the tools also allow you to define and correlate personas with user stories.
As in the case of email composition, once the email has been written, the composer may choose to select a category of people they would like to “view from their perspective”. Can the email application define categories of recipients, and then preview these emails from their perspective viewpoints?
What will the selected persona derive from the words arranged in a particular order? What meaning will they attribute to the email?
Use Personas in the formulation of user stories/requirements; understand how Personas will react to “the system”, and changes to the system.
Finally the use of the [email composer] solution based on “actors” or “personas”. What personas’ are “out of the box”? What personas will need to be derived by the email composer’s setup of these categories of people? Wizard-based Persona definitions?
There are already software development tools like Azure DevOps (ADO), which empower teams to manage product backlogs and correlate “User Stories”, or “Product Backlog Items” with Personas. These are static personas, that are completely user-defined, and no intelligence to correlate “user stories” with personas”. Users of ADO must create these links.
Now, technology can assist us to consider the intended audience, a systematic, biased perspective using Artificial Intelligence to inspect your email based on selected “point of view” (a Persons) of the intended email. Maybe your email will be misconstrued as abrasive, and not the intended response.
Many applications that enable users to create their own content from word processing to graphics/image creation have typically relied upon 3rd party Content Management Solutions (CMS) / Digital Asset Management (DAM) platforms to collect metadata describing the assets upon ingestion into their platforms. Many of these platforms have been “stood up” to support projects/teams either for collaboration on an existing project, or reuse of assets for “other” projects. As a person constantly creating content, where do you “park” your digital resources for archiving and reuse? Your local drive, cloud storage, or not archived?
Average “Jane” / “Joe” Digital Authors
If I were asked for all the content I’ve created around a particular topic or group of topics from all my collected/ingested digital assets, it may be a herculean search effort spanning multiple platforms. As an independent creator of content, I may have digital assets ranging from Microsoft Word documents, Google Sheets spreadsheets, Twitter tweets, Paint.Net (.pdn) Graphics, Blog Posts, etc.
Capturing Content from Microsoft Office Suite Products
Many of the MS Office content creation products such as Microsoft Word have minimal capacity to capture metadata, and if the ability exists, it’s subdued in the application. MS Word, for example, if a user selects “Save As”, they will be able to add/insert “Authors”, and Tags. In Microsoft Excel, latest version, the author of the Workbook has the ability to add Properties, such as Tags, and Categories. It’s not clear how this data is utilized outside the application, such as the tag data being searchable after uploaded/ingested by OneDrive?
Blog Posts: High Visibility into Categorization and Tagging
A “blogging platform”, such as WordPress, places the Category and Tagging selection fields right justified to the content being posted. In this UI/UX, it forces a specific mentality to the creation, categorization, and tagging of content. This blogging structure constantly reminds the author to identify the content so others may identify and consume the content. Blog post content is created to be consumed by a wide audience of interested viewers based on those tags and categories selected.
Proactive Categorization and Tagging
Perpetuate content classification through drill-down navigation of a derived Information Architecture Taxonomy. As a “light weight” example, in WordPress, the Tags field when editing a Post, a user starts typing in a few characters, an auto-complete dropdown list appears to the user to select one or more of these previously used tags. Excellent starting point for other Content Creation Apps.
Users creating Blog Posts can define a Parent/Child hierarchy of categories, and the author may select one or more of relevant categories to be associated with the Post.
Artificial Intelligence (AI) Derived Tags
It wouldn’t be a post without mentioning AI. Integrated into applications that enable user content creation could be a tool, at a minimum, automatically derives an “Index” of words, or tags. The way in which this “intelligent index” is derived may be based upon:
# of times word occurrence
mention of words in a particular context
reference of the same word(s) or phrases in other content
defined by the same author, and/or across the platform.
This intelligently derived index of data should be made available to any platforms that ingest content from OneDrive, SharePoint, Google Docs, etc. These DAMs ( or Intelligent Cloud Storage) can leverage this information for any searches across the platforms.
Easy to Retrieve the Desired Content, and Repurpose It
Many Content Creation applications heavily rely on “Recent Accessed Files” within the app. If the Information Architecture/Taxonomy hierarchy were presented in the “File Open” section, and a user can drill down on select Categories/Subcategories (and/or tags), it might be easier to find the most desired content.
All Eyes on Content Curation: Creation to Archive
Content creation products should all focus on the collection of metadata at the time of their creation.
Using the Blog Posting methodology, the creation of content should be alongside the metadata tagging
Taxonomy (categories, and tags with hierarchy) searches from within the Content Creation applications, and from the Operating System level, the “Original” Digital Asset Management solution (DAM), e.g. MS Windows, Mac
Coupling Content Distribution (i.e. ISPs) with Content Producers
Verizon FiOS offers Netflix as another channel in their already expansive lineup of content. Is this a deal of convenience for the consumer, keeping consumers going through one medium, or is it something more? Amazon Video iOS application offers HBO, STARZ, and others as long as Amazon Prime customers have a subscription to the Content Producers. Convenience or more? The Netflix Content and Distribution via Set-top box (STB) channel should be mimicked by Google YouTube and Amazon Video despite their competing hardware offerings. Consumers should be empowered to decide how they want to consume Amazon Video; e.g. through their Set-top box (STB). However, there may be more than just a convenience benefit.
As Net Neutrality fades into the sunset of congressional debates and lobbyists, the new FCC ruling indicates the prevailing winds of change. We question how content providers, large and small, navigate the path to survival/sustainability. Some business models from content distribution invoke Bandwidth Throttling, which may inhibit the consumers of some content, either by content types (e.g. Video formats) or content providers (e.g. Verizon FiOS providing priority bandwidth to Netflix).
Content Creators / Producers, without a deal with ISPs for “priority bandwidth” may find their customers flock to ‘larger content creators’ who may be able to get better deals for content throughput.
Akamai and Amazon CloudFront – Content Delivery Networks (CDNs)
Amazon CloudFront a global content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to viewers with low latency and high transfer speeds. CloudFront, like Akamai, may significantly benefit from the decision by the FCC to repeal Net Neutrality.
Akamai’s industry-leading scale and resiliency mean delivering critical content with consistency, quality, and security across every device, every time. Great web and mobile experiences are key to engaging users, yet difficult to achieve. To drive engagement and online revenue, it’s critical to optimize performance for consumer audiences and employees alike to meet or exceed their expectations for consistent, fast, secure experiences.
Integrating into Content/Internet Service Provider’s Bundle of Channels
By elevating Content Producers into the ISP (distribution channel) Set-top box (STB), does this ‘packaging’ go beyond bundling of content for convenience? For example, when Netflix uses Verizon FiOS’ CDN for content delivery to their clients, will the consumer benefit from this bundled partnership beyond convenience (i.e. performance)? When Netflix is invoked by a Verizon FiOS customer from their laptop (direct from Netflix), is there a performance improvement if Netflix is invoked from the Verizon FiOS Set-top Box (STB) instead? Would these two separate use cases for invoking Netflix movies utilize two alternate Content delivery network (CDN) paths, one more optimized than the other?
As of this post update (12/26), there has been no comment from Verizon.
Advice is integrated within the application, proactive and reactive: When searching in Microsoft Edge, a blinking circle representing Cortana is illuminated. Cortana says “I’ve collected similar articles on this topic.” If selected, presents 10 similar results in a right panel to help you find what you need.
Personal Data Access and Management
The user can vocally access their personal data, and make modifications to that data; E.g. Add entries to their Calendar, and retrieve the current day’s agenda.
Platform Capabilities: Mobile Phone Advantage
Strengthen core telephonic capabilities where competition, Amazon and Microsoft, are relatively week.
Ability to record conversations, and push/store content in Cloud, e.g. iCloud. Cloud Serverless recording mechanism dynamically tags a conversations with “Keywords” creating an Index to the conversation. Users may search recording, and playback audio clips +/- 10 seconds before and after tagged occurrence.
Calls into the User’s Smartphones May Interact Directly with the Digital Assistant
Call Screening – The digital assistant asks for the name of the caller, purpose of the call, and if the matter is “Urgent”
A generic “purpose” response, or a list of caller purpose items can be supplied to the caller, e.g. 1) Schedule an Appointment
The smartphone’s user would receive the caller’s name, and the purpose as a message back to the UI from the call, currently in a ‘hold’ state,
The smartphone user may decide to accept the call, or reject the call and send the caller to voice mail.
A caller may ask to schedule a meeting with the user, and the digital assistant may access the user’s calendar to determine availability. The digital assistant may schedule a ‘tentative’ appointment within the user’s calendar.
If calendar indicates availability, a ‘tentative’ meeting will be entered. The smartphone user would have a list of tasks from the assistant, and one of the tasks is to ‘affirm’ availability of the meetings scheduled.
If a caller would like to know the address of the smartphone user’s office, the Digital Assistant may access a database of “generally available” information, and provide it. The Smartphone user may use applications like Google Keep, and any note tagged with a label “Open Access” may be accessible to any caller.
Custom business workflows may be triggered through the smartphone, such as “Pay by Phone”. When a caller is calling a business user’s smartphone, the call goes to “voice mail” or “digital assistant” based on smartphone user’s configuration. If the user reaches the “Digital Assistant”, there may be a list of options the user may perform, such as “Request for Service” appointment. The caller would navigate through a voice recognition, one of many defined by the smartphone users’ workflows.
Platform Capabilities: Mobile Multimedia
Either through your mobile Smartphone, or through a portable speaker with voice recognition (VR).
Streaming media / music to portable device based on interactions with Digital Assistant.
Menu to navigate relevant (to you) news, and Digital Assistant to read articles through your portable media device (without UI)
Third Party Partnerships: Adding User Base, and Expanding Capabilities
In the form of platform apps (abstraction), or 3rd party APIs which integrate into the Digital Assistant, allowing users to directly execute application commands, e.g. Play Spotify song, My Way by Frank Sinatra.
Any “Skill Set” with specialized knowledge: direct Q&A or instructional guidance – e.g Home Improvement, Cooking
eCommerce Personalized Experience – Amazon
Home Automation – doors, thermostats
Music – Spotify
Navigate Set Top Box (STB) – e.g. find a program to watch
Video on Demand (VOD) – e.g. set to record entertainment
Serverless computing is a cloud computing code execution model in which the cloud provider fully manages starting and stopping virtual machines as necessary to serve requests, and requests are billed by an abstract measure of the resources required to satisfy the request, rather than per virtual machine, per hour. Despite the name, it does not actually involve running code without servers. Serverless computing is so named because the business or person that owns the system does not have to purchase, rent or provision servers or virtual machines for the back-end code to run .
Based on your application Use Case(s), Cloud Serverless Computing architecture may reduce ongoing costs for application usage, and provide scalability on demand without the Cloud Server Instance management overhead, i.e. costs and effort.
Note: Cloud Serverless Computing is used interchangeability with Functions as a service (FaaS) which makes sense from a developer’s standpoint as they are coding Functions (or Methods), and that’s the level of abstraction.
Create automated workflows between apps and services to get notifications, synchronize files, collect data, and more. Although not the traditional Serverless Computing implementation, it’s the quickest way to perform application services without having to procure the application servers. Depending on your microservices (connectors + templates) definitions, you may not need to write a single line of code, and could all be done through the Flow console.
Connectors are “enablers” to connect to [data] sources in order to extract or insert data, typically one Connector per service, such as Twitter.
Templates utilize Connectors, and enable workflow designers to build business process workflows. Execution of the manufactured workflows performs the activities either Event trigger driven, or ADHOC / manual execution through the portal or through the Microsoft Flow mobile apps.
154 Service Connectors Exist. Several “Premium” connectors require monthly nominal fee (5 USD). For example, using the Oracle Database Connecter empowers the workflow designer insert, update, select, and delete rows in a table.
Automating business processes by designing workflows to turn repetitive tasks into multi-step workflows
Microsoft Flow Pricing
As listed below, there are three tiers, which includes a free tier for personal use or exploring the platform for your business. The pay Flow plans seem ridiculously inexpensive based on what business workflow designers receive for the 5 USD or 15 USD per month. Microsoft Flow has abstracted building workflows so almost anyone can build application workflows or automate business manual workflows leveraging almost any of the popular applications on the market.
It doesn’t seem like 3rd party [data] Connectors and Template creators receive any direct monetary value from the Microsoft Flow platform. Although workflow designers and business owners may be swayed to purchase 3rd party product licenses for the use of their core technology.
Properly designed microservices have a single responsibility and can independently scale. With traditional applications being broken up into 100s of microservices, traditional platform technologies can lead to significant increase in management and infrastructure costs. Google Cloud Platform’s serverless products mitigates these challenges and help you create cost-effective microservices.
AWS provides a set of fully managed services that you can use to build and run serverless applications. You use these services to build serverless applications that don’t require provisioning, maintaining, and administering servers for backend components such as compute, databases, storage, stream processing, message queueing, and more. You also no longer need to worry about ensuring application fault tolerance and availability. Instead, AWS handles all of these capabilities for you, allowing you to focus on product innovation and get faster time-to-market. It’s important to note that Amazon was the first contender in this space with a 2014 product launch.
Execute code on demand in a highly scalable serverless environment. Create and run event-driven apps that scale on demand.
Focus on essential event-driven logic, not on maintaining servers
Integrate with a catalog of services
Pay for actual usage rather than projected peaks
The OpenWhisk serverless architecture accelerates development as a set of small, distinct, and independent actions. By abstracting away infrastructure, OpenWhisk frees members of small teams to rapidly work on different pieces of code simultaneously, keeping the overall focus on creating user experiences customers want.
Serverless Computing is a decision that needs to be made based on the usage profile of your application. For the right use case, serverless computing is an excellent choice that is ready for prime time and can provide significant cost savings.
The ultimate goal, in my mind, is to have the capability within a Search Engine to be able to upload an image, then the search engine analyzes the image, and finds comparable images within some degree of variation, as dictated in the search properties. The search engine may also derive metadata from the uploaded image such as attributes specific to the image object(s) types. For example, determine if a person [object] is “Joyful” or “Angry”.
As of the writing of this article, search engines Yahoo and Microsoft Bing do not have the capability to upload an image and perform image/pattern recognition, and return results. Behold, Google’s search engine has the ability to use some type of pattern matching, and find instances of your image across the world wide web. From the Google Search “home page”, select “Images”, or after a text search, select the “Images” menu item. From there, an additional icon appears, a camera with the hint text “Search by Image”. Select the Camera icon, and you are presented with options on how Google can acquire your image, e.g. upload, or an image URL.
Select the “Upload an Image” tab, choose a file, and upload. I used a fictional character, Max Headroom. The search results were very good (see below). I also attempted an uncommon shape, and it did not meet my expectations. The poor performance of matching this possibly “unique” shape is mostly likely due to how the Google Image Classifier Model was defined, and correlating training data that tested the classifier model. If the shape is “Unique” the Google Search Image Engine did it’s job.
Google Image Search Results – Max Headroom
Google Image Search Results – Odd Shaped Metal Object
The Google Search Image Engine was able to “Classify” the image as “metal”, so that’s good. However I would have liked to see better matches under the “Visually Similar Image” section. Again, this is probably due to the image classification process, and potentially the diversity of image samples.
A Few Questions for Google
How often is the Classifier Modeling process executed (i.e. training the classifier), and the model tested? How are new images incorporated into the Classifier model? Are the user uploaded images now included in the Model (after model training is run again)? Is Google Search Image incorporating ALL Internet images into Classifier Model(s)? Is an alternate AI Image Recognition process used beyond Classifier Models?
I’m not sure if the Cloud Vision API uses the same technology as Google’s Search Image Engine, but it’s worth noting. After reaching the Cloud Vision API starting page, go to the “Try the API” section, and upload your image. I tried a number of samples, including my odd shaped metal, and I uploaded the image. I think it performed fairly well on the “labels” (i.e. image attributes)
Using the Google Cloud Vision API, to determine if there were any WEB matches with my odd shaped metal object, the search came up with no results. In contrast, using Google’s Search Image Engine produced some “similar” web results.
Finally, I tested the Google Cloud Vision API with a self portrait image. THIS was so cool.
The API brought back several image attributes specific to “Faces”. It attempts to identify certain complex facial attributes, things like emotions, e.g. Joy, and Sorrow.
The API brought back the “Standard” set of Labels which show how the Classifier identified this image as a “Person”, such as Forehead and Chin.
Finally, the Google Cloud Vision API brought back the Web references, things like it identified me as a Project Manager, and an obscure reference to Zurg in my Twitter Bio.
The Google Cloud Vision API, and their own baked in Google Search Image Engine are extremely enticing, but yet have a ways to go in terms of accuracy %. Of course, I tried using my face in the Google Search Image Engine, and looking at the “Visually Similar Images” didn’t retrieve any images of me, or even a distant cousin (maybe?)
Amazon’s Echo and Google’s Home are the two most compelling products in the new smart-speaker market. It’s a fascinating space to watch, for it is of substantial strategic importance to both companies as well as several more that will enter the fray soon. Why is this? Whatever device you outfit your home with will influence many downstream purchasing decisions, from automation hardware to digital media and even to where you order dog food. Because of this strategic importance, the leading players are investing vast amounts of money to make their product the market leader.
These devices have a broad range of functionality, most of which is not discussed in this article. As such, it is a review not of the devices overall, but rather simply their function as answer engines. You can, on a whim, ask them almost any question and they will try to answer it. I have both devices on my desk, and almost immediately I noticed something very puzzling: They often give different answers to the same questions. Not opinion questions, you understand, but factual questions, the kinds of things you would expect them to be in full agreement on, such as the number of seconds in a year.
As someone who has worked with Artificial Intelligence in some shape or form for the last 20 years, I’d like to throw in my commentary on the article.
Human Utterances and their Correlation to Goal / Intent Recognition. There are innumerable ways to ask for something you want. The ‘ask’ is a ‘human utterance’ which should trigger the ‘goal / intent’ of what knowledge the person is requesting. AI Chat Bots, digital agents, have a table of these utterances which all roll up to a single goal. Hundreds of utterances may be supplied per goal. In fact, Amazon has a service, Mechanical Turk, the Artificial Artificial Intelligence, which you may “Ask workers to complete HITs – Human Intelligence Tasks – and get results using Mechanical Turk”. They boast access to a global, on-demand, 24 x 7 workforce to get thousands of HITs completed in minutes. There are also ways in which the AI Digital Agent may ‘rephrase’ what the AI considers utterances that are closely related. Companies like IBM look toward human recognition, accuracy of comprehension as 95% of the words in a given conversation. On March 7, IBM announced it had become the first to hone in on that benchmark, having achieved a 5.5% error rate.
Algorithmic ‘weighted’ Selection verses Curated Content. It makes sense based on how these two companies ‘grew up’, that Amazon relies on their curated content acquisitions such as Evi, a technology company which specialises in knowledge base and semantic search engine software. Its first product was an answer engine that aimed to directly answer questions on any subject posed in plain English text, which is accomplished using a database of discrete facts. “Google, on the other hand, pulls many of its answers straight from the web. In fact, you know how sometimes you do a search in Google and the answer comes up in snippet form at the top of the results? Well, often Google Assistant simply reads those answers.” Truncated answers equate to incorrect answers.
Instead of a direct Q&A style approach, where a human utterance, question, triggers an intent/goal [answer], a process by which ‘clarifying questions‘ maybe asked by the AI digital agent. A dialog workflow may disambiguate the goal by narrowing down what the user is looking for. This disambiguation process is a part of common technique in human interaction, and is represented in a workflow diagram with logic decision paths. It seems this technique may require human guidance, and prone to bias, error and additional overhead for content curation.
Who are the content curators for knowledge, providing ‘factual’ answers, and/or opinions? Are curators ‘self proclaimed’ Subject Matter Experts (SMEs), people entitled with degrees in History? or IT / business analysts making the content decisions?
Questions requesting opinionated information may vary greatly between AI platform, and between questions within the same AI knowledge base. Opinions may offend, be intentionally biased, sour the AI / human experience.