Animals could provide a new source of training data for AI systems.
To train AI to think like a dog, the researchers first needed data. They collected this in the form of videos and motion information captured from a single dog, a Malamute named Kelp. A total of 380 short videos were taken from a GoPro camera mounted to the dog’s head, along with movement data from sensors on its legs and body.
They captured a dog going about its daily life — walking, playing fetch, and going to the park.
Researchers analyzed Kelp’s behavior using deep learning, an AI technique that can be used to sift patterns from data, matching the motion data of Kelp’s limbs and the visual data from the GoPro with various doggy activities.
The resulting neural network trained on this information could predict what a dog would do in certain situations. If it saw someone throwing a ball, for example, it would know that the reaction of a dog would be to turn and chase it.
The predictive capacity of their AI system was very accurate, but only in short bursts. In other words, if the video shows a set of stairs, then you can guess the dog is going to climb them. But beyond that, life is simply too varied to predict.
Dogs “clearly demonstrate visual intelligence, recognizing food, obstacles, other humans, and animals,” so does a neural network trained to act like a dog show the same cleverness?
It turns out yes.
Researchers applied two tests to the neural network, asking it to identify different scenes (e.g., indoors, outdoors, on stairs, on a balcony) and “walkable surfaces” (which are exactly what they sound like: places can walk). In both cases, the neural network was able to complete these tasks with decent accuracy using just the basic data it had of a dog’s movements and whereabouts.
“This is a very hard task for a computer because it requires a lot of prior knowledge.” This knowledge might be whether a surface is too steep to walk on or if it’s spiky and uncomfortable. It would be time-consuming to program a robot with all these rules, but a dog already knows them all. So by watching Kelp’s behavior, the neural network learned these rules without having to be taught them. In other words, it learned from the dog.
There are, of course, 3rd party platforms that perform very well, are feature rich, and agnostic to all file types. For example, within a very short period of time, low cost, and possibly a few plugins, a WordPress site can be configured and deployed to suit your needs of Digital Asset Managment (DAM). The long-term goal is to incorporate techniques such as Auto Curation to any/all files, leveraging an ever-growing intelligent taxonomy, a taxonomy built on user-defined labels/tags, as well an AI rules engine with ML techniques. OneDrive, as a cloud storage platform, may bridge the gap between JUST cloud storage and a DAM.
Content Creation Apps and Auto Curation
The ability for Content Creation applications, such as Microsoft Word, to capture not only the user-defined tags but also the context of the tags relating to the content.
When ingesting a Microsoft PowerPoint presentation, after consuming the file, and Auto Curation process can extract “reusable components” of the file, such as slide header/name, and the correlated content such as a table, chart, or graphics.
Ingesting Microsoft Excel and Auto Curation of Workbooks may yield “reusable components” stored as metadata tags, and their correlated content, such as chart and table names.
Ingesting and Auto Curation of Microsoft Word documents may build a classic Index for all the most frequently occurring words, and augment the manually user-defined tags in the file.
Ingestion of Photos [and Videos] into and Intelligent Cloud Storage Platform, during the Auto Curation process, may identify commonly identifiable objects, such as trees or people. These objects would be automatically tagged through the Auto Curation process after Ingestion.
Ability to extract the content file metadata, objects and text tags, to be stored in a standard format to be extracted by DAMs, or Intelligent Cloud Storage Platforms with file and metadata search capabilities. Could OneDrive be that intelligent platform?
A user can search for a file title or throughout the Manual and Auto Curated, defined metadata associated with the file. The DAM or Intelligent Cloud Storage Platform provides both search results. “Reusable components” of files are also searchable.
For “Reusable Components” to be parsed out of the files to be separate entities, a process needs to occur after Ingestion Auto Curration.
Content Creation application, user-entry tag/text fields should have “drop-down” access to the search index populated with auto/manual created tags.
Auto Curation and Intelligent Cloud Storage
The intelligence of Auto Curation should be built into the Cloud Storage Platform, e.g. potentially OneDrive.
At a minimum, auto curation should update the cloud storage platform indexing engine to correlate files and metadata.
Auto Curation is the ‘secret sauce’ that “digests” the content to build the search engine index, which contains identified objects (e.g. tag and text or coordinates) automatically
Auto Curation may leverage a rules engine (AI) and apply user configurable rules such as “keyword density” thresholds
Artificial Intelligence, Machine Learning rules may be applied to the content to derive additional labels/tags.
If leveraging version control of the intelligent cloud storage platform, each iteration should “re-index” the content, and update the Auto Curation metadata tags. User-created tags are untouched.
If no user-defined labels/tags exist, upon ingestion, the user may be prompted for tags
Auto Curation and “3rd Party” Sources
In the context of sources such as a Twitter feed, there exists no incorporation of feeds into an Intelligent Cloud Storage. OneDrive, Cloud Intelligent Storage may import feeds from 3rd party sources, and each Tweet would be defined as an object which is searchable along with its metadata (e.g. likes; tags).
Operating System, Intelligent Cloud Storage/DAM
The Intelligent Cloud Storage and DAM solutions should have integrated search capabilities, so on the OS (mobile or desktop) level, the discovery of content through the OS search of tagged metadata is possible.
OneDrive has no ability to search Microsoft Word tags
The UI for all Productivity Tools must have a comprehensive and simple design for leveraging an existing taxonomy for manual tagging, and the ability to add hints for auto curation
Currently, Microsoft Word has two fields to collect metadata about the file. It’s obscurely found at the “Save As” dialog.
The “Save As” dialogue box allows a user to add tags and authors but only when using the MS Word desktop version. The Online (Cloud) version of Word has no such option when saving to Microsoft OneDrive Cloud Storage
Auto Curation (Artificial Intelligence, AI) must inspect the MS Productivity suite tools, and extract tags automatically which does not exist today.
No manual taging or Auto Curation/Facial Recognition exists.
about deconstructing existing functionality of entire Photo Archive and Sharing platforms.
to bring an awareness to the masses about corporate decisions to omit the advanced capabilities of cataloguing photos, object recognition, and advanced metadata tagging.
Backstory: The Asks / Needs
Every day my family takes tons of pictures, and the pictures are bulk loaded up to The Cloud using Cloud Storage Services, such as DropBox, OneDrive, Google Photos, or iCloud. A selected set of photos are uploaded to our favourite Social Networking platform (e.g. Facebook, Instagram, Snapchat, and/or Twitter).
Every so often, I will take pause, and create either a Photobook or print out pictures from the last several months. The kids may have a project for school to print out e.g. Family Portrait or just a picture of Mom and the kids. In order to find these photos, I have to manually go through our collection of photographs from our Cloud Storage Services, or identify the photos from our Social Network libraries.
Social Networking Platform Facebook
As far as I can remember the Social Networking platform Facebook has had the ability to tag faces in photos uploaded to the platform. There are restrictions, such as whom you can tag from the privacy side, but the capability still exists. The Facebook platform also automatically identifies faces within photos, i.e. places a box around faces in a photo to make the person tagging capability easier. So, in essence, there is an “intelligent capability” to identify faces in a photo. It seems like the Facebook platform allows you to see “Photos of You”, but what seems to be missing is to search for all photos of Fred Smith, a friend of yours, even if all his photos are public. By design, it sounds fit for the purpose of the networking platform.
Automatically upload new images in bulk or one at a time to a Cloud Storage Service ( with or without Online Printing Capabilities, e.g. Photobooks) and an automated curation process begins.
The Auto Curation process scans photos for:
“Commonly Identifiable Objects”, such as #Car, #Clock, #Fireworks, and #People
Auto Curation of new photos, based on previously tagged objects and faces in newly uploaded photos will be automatically tagged.
Once auto curation runs several times, and people are manually #taged, the auto curation process will “Learn” faces. Any new auto curation process executed should be able to recognize tagged people in new pictures.
Auto Curation process emails / notifies the library owners of the ingestion process results, e.g. Jane Doe and John Smith photographed at Disney World on Date / Time stamp. i.e. Report of executed ingestion, and auto curation process.
After upload, and auto curation process, optionally, it’s time to manually tag people’s faces, and any ‘objects’ which you would like to track, e.g. Car aficionado, #tag vehicle make/model with additional descriptive tags. Using the photo curator function on the Cloud Storage Service can tag any “objects” in the photo using Rectangle or Lasso Select.
Curation to Take Action
Once photo libraries are curated, the library owner(s) can:
Automatically build albums based one or more #tags
Smart Albums automatically update, e.g. after ingestion and Auto Curation. Albums are tag sensitive and update with new pics that contain certain people or objects. The user/ librarian may dictate logic for tags.
Where is this Functionality??
Why are may major companies not implementing facial (and object) recognition? Google and Microsoft seem to have the capability/size of the company to be able to produce the technology.
Is it possible Google and Microsoft are subject to more scrutiny than a Shutterfly? Do privacy concerns at the moment, leave others to become trailblazers in this area?
Advice is integrated within the application, proactive and reactive: When searching in Microsoft Edge, a blinking circle representing Cortana is illuminated. Cortana says “I’ve collected similar articles on this topic.” If selected, presents 10 similar results in a right panel to help you find what you need.
Personal Data Access and Management
The user can vocally access their personal data, and make modifications to that data; E.g. Add entries to their Calendar, and retrieve the current day’s agenda.
Platform Capabilities: Mobile Phone Advantage
Strengthen core telephonic capabilities where competition, Amazon and Microsoft, are relatively week.
Ability to record conversations, and push/store content in Cloud, e.g. iCloud. Cloud Serverless recording mechanism dynamically tags a conversations with “Keywords” creating an Index to the conversation. Users may search recording, and playback audio clips +/- 10 seconds before and after tagged occurrence.
Calls into the User’s Smartphones May Interact Directly with the Digital Assistant
Call Screening – The digital assistant asks for the name of the caller, purpose of the call, and if the matter is “Urgent”
A generic “purpose” response, or a list of caller purpose items can be supplied to the caller, e.g. 1) Schedule an Appointment
The smartphone’s user would receive the caller’s name, and the purpose as a message back to the UI from the call, currently in a ‘hold’ state,
The smartphone user may decide to accept the call, or reject the call and send the caller to voice mail.
A caller may ask to schedule a meeting with the user, and the digital assistant may access the user’s calendar to determine availability. The digital assistant may schedule a ‘tentative’ appointment within the user’s calendar.
If calendar indicates availability, a ‘tentative’ meeting will be entered. The smartphone user would have a list of tasks from the assistant, and one of the tasks is to ‘affirm’ availability of the meetings scheduled.
If a caller would like to know the address of the smartphone user’s office, the Digital Assistant may access a database of “generally available” information, and provide it. The Smartphone user may use applications like Google Keep, and any note tagged with a label “Open Access” may be accessible to any caller.
Custom business workflows may be triggered through the smartphone, such as “Pay by Phone”. When a caller is calling a business user’s smartphone, the call goes to “voice mail” or “digital assistant” based on smartphone user’s configuration. If the user reaches the “Digital Assistant”, there may be a list of options the user may perform, such as “Request for Service” appointment. The caller would navigate through a voice recognition, one of many defined by the smartphone users’ workflows.
Platform Capabilities: Mobile Multimedia
Either through your mobile Smartphone, or through a portable speaker with voice recognition (VR).
Streaming media / music to portable device based on interactions with Digital Assistant.
Menu to navigate relevant (to you) news, and Digital Assistant to read articles through your portable media device (without UI)
Third Party Partnerships: Adding User Base, and Expanding Capabilities
In the form of platform apps (abstraction), or 3rd party APIs which integrate into the Digital Assistant, allowing users to directly execute application commands, e.g. Play Spotify song, My Way by Frank Sinatra.
Any “Skill Set” with specialized knowledge: direct Q&A or instructional guidance – e.g Home Improvement, Cooking
eCommerce Personalized Experience – Amazon
Home Automation – doors, thermostats
Music – Spotify
Navigate Set Top Box (STB) – e.g. find a program to watch
Video on Demand (VOD) – e.g. set to record entertainment
Protecting the Data Warehouse with Artificial Intelligence
Teleran is a middleware company who’s software monitors and governs OLAP activity between the Data Warehouse and Business Intelligence tools, like Business Objects and Cognos. Teleran’s suite of tools encompass a comprehensive analytical and monitoring solution called iSight. In addition, Teleran has a product that leverages artificial intelligence and machine learning to impose real-time query and data access controls. Architecture also allows for Teleran’s agent not to be on the same host as the database, for additional security and prevention of utilizing resources from the database host.
Key Features of iGuard:
Policy engine prevents “bad” queries before reaching database
Patented rule engine resides in-memory to evaluate queries at database protocol layer on TCP/IP network
Patented rule engine prevents inappropriate or long-running queries from reaching the data
70 Customizable Policy Templates
SQL Query Policies
Create policies using policy templates based on SQL Syntax:
Require JOIN to Security Table
Column Combination Restriction – Ex. Prevents combining customer name and social security #
Table JOIN restriction – Ex. Prevents joining two different tables in same query
Equi-literal Compare requirement – Tightly Constrains Query Ex. Prevents hunting for sensitive data by requiring ‘=‘ condition
By user or user groups and time of day (shift) (e.g. ETL)
Blocks connections to the database
White list or black list by
DB User Logins
OS User Logins
Applications (BI, Query Apps)
Rule Templates Contain Customizable Messages
Each of the “Policy Templates” has the ability to send the user querying the database a customized message based on the defined policy. The message back to the user from Teleran should be seamless to the application user’s experience.
Machine Learning: Curbing Inappropriate, or Long Running Queries
iGuard has the ability to analyze all of the historical SQL passed through to the Data Warehouse, and suggest new, customized policies to cancel queries with certain SQL characteristics. The Teleran administrator sets parameters such as rows or bytes returned, and then runs the induction process. New rules will be suggested which exceed these defined parameters. The induction engine is “smart” enough to look at the repository of queries holistically and not make determinations based on a single query.
The ultimate goal, in my mind, is to have the capability within a Search Engine to be able to upload an image, then the search engine analyzes the image, and finds comparable images within some degree of variation, as dictated in the search properties. The search engine may also derive metadata from the uploaded image such as attributes specific to the image object(s) types. For example, determine if a person [object] is “Joyful” or “Angry”.
As of the writing of this article, search engines Yahoo and Microsoft Bing do not have the capability to upload an image and perform image/pattern recognition, and return results. Behold, Google’s search engine has the ability to use some type of pattern matching, and find instances of your image across the world wide web. From the Google Search “home page”, select “Images”, or after a text search, select the “Images” menu item. From there, an additional icon appears, a camera with the hint text “Search by Image”. Select the Camera icon, and you are presented with options on how Google can acquire your image, e.g. upload, or an image URL.
Select the “Upload an Image” tab, choose a file, and upload. I used a fictional character, Max Headroom. The search results were very good (see below). I also attempted an uncommon shape, and it did not meet my expectations. The poor performance of matching this possibly “unique” shape is mostly likely due to how the Google Image Classifier Model was defined, and correlating training data that tested the classifier model. If the shape is “Unique” the Google Search Image Engine did it’s job.
Google Image Search Results – Max Headroom
Google Image Search Results – Odd Shaped Metal Object
The Google Search Image Engine was able to “Classify” the image as “metal”, so that’s good. However I would have liked to see better matches under the “Visually Similar Image” section. Again, this is probably due to the image classification process, and potentially the diversity of image samples.
A Few Questions for Google
How often is the Classifier Modeling process executed (i.e. training the classifier), and the model tested? How are new images incorporated into the Classifier model? Are the user uploaded images now included in the Model (after model training is run again)? Is Google Search Image incorporating ALL Internet images into Classifier Model(s)? Is an alternate AI Image Recognition process used beyond Classifier Models?
I’m not sure if the Cloud Vision API uses the same technology as Google’s Search Image Engine, but it’s worth noting. After reaching the Cloud Vision API starting page, go to the “Try the API” section, and upload your image. I tried a number of samples, including my odd shaped metal, and I uploaded the image. I think it performed fairly well on the “labels” (i.e. image attributes)
Using the Google Cloud Vision API, to determine if there were any WEB matches with my odd shaped metal object, the search came up with no results. In contrast, using Google’s Search Image Engine produced some “similar” web results.
Finally, I tested the Google Cloud Vision API with a self portrait image. THIS was so cool.
The API brought back several image attributes specific to “Faces”. It attempts to identify certain complex facial attributes, things like emotions, e.g. Joy, and Sorrow.
The API brought back the “Standard” set of Labels which show how the Classifier identified this image as a “Person”, such as Forehead and Chin.
Finally, the Google Cloud Vision API brought back the Web references, things like it identified me as a Project Manager, and an obscure reference to Zurg in my Twitter Bio.
The Google Cloud Vision API, and their own baked in Google Search Image Engine are extremely enticing, but yet have a ways to go in terms of accuracy %. Of course, I tried using my face in the Google Search Image Engine, and looking at the “Visually Similar Images” didn’t retrieve any images of me, or even a distant cousin (maybe?)
Amazon’s Echo and Google’s Home are the two most compelling products in the new smart-speaker market. It’s a fascinating space to watch, for it is of substantial strategic importance to both companies as well as several more that will enter the fray soon. Why is this? Whatever device you outfit your home with will influence many downstream purchasing decisions, from automation hardware to digital media and even to where you order dog food. Because of this strategic importance, the leading players are investing vast amounts of money to make their product the market leader.
These devices have a broad range of functionality, most of which is not discussed in this article. As such, it is a review not of the devices overall, but rather simply their function as answer engines. You can, on a whim, ask them almost any question and they will try to answer it. I have both devices on my desk, and almost immediately I noticed something very puzzling: They often give different answers to the same questions. Not opinion questions, you understand, but factual questions, the kinds of things you would expect them to be in full agreement on, such as the number of seconds in a year.
As someone who has worked with Artificial Intelligence in some shape or form for the last 20 years, I’d like to throw in my commentary on the article.
Human Utterances and their Correlation to Goal / Intent Recognition. There are innumerable ways to ask for something you want. The ‘ask’ is a ‘human utterance’ which should trigger the ‘goal / intent’ of what knowledge the person is requesting. AI Chat Bots, digital agents, have a table of these utterances which all roll up to a single goal. Hundreds of utterances may be supplied per goal. In fact, Amazon has a service, Mechanical Turk, the Artificial Artificial Intelligence, which you may “Ask workers to complete HITs – Human Intelligence Tasks – and get results using Mechanical Turk”. They boast access to a global, on-demand, 24 x 7 workforce to get thousands of HITs completed in minutes. There are also ways in which the AI Digital Agent may ‘rephrase’ what the AI considers utterances that are closely related. Companies like IBM look toward human recognition, accuracy of comprehension as 95% of the words in a given conversation. On March 7, IBM announced it had become the first to hone in on that benchmark, having achieved a 5.5% error rate.
Algorithmic ‘weighted’ Selection verses Curated Content. It makes sense based on how these two companies ‘grew up’, that Amazon relies on their curated content acquisitions such as Evi, a technology company which specialises in knowledge base and semantic search engine software. Its first product was an answer engine that aimed to directly answer questions on any subject posed in plain English text, which is accomplished using a database of discrete facts. “Google, on the other hand, pulls many of its answers straight from the web. In fact, you know how sometimes you do a search in Google and the answer comes up in snippet form at the top of the results? Well, often Google Assistant simply reads those answers.” Truncated answers equate to incorrect answers.
Instead of a direct Q&A style approach, where a human utterance, question, triggers an intent/goal , a process by which ‘clarifying questions‘ maybe asked by the AI digital agent. A dialog workflow may disambiguate the goal by narrowing down what the user is looking for. This disambiguation process is a part of common technique in human interaction, and is represented in a workflow diagram with logic decision paths. It seems this technique may require human guidance, and prone to bias, error and additional overhead for content curation.
Who are the content curators for knowledge, providing ‘factual’ answers, and/or opinions? Are curators ‘self proclaimed’ Subject Matter Experts (SMEs), people entitled with degrees in History? or IT / business analysts making the content decisions?
Questions requesting opinionated information may vary greatly between AI platform, and between questions within the same AI knowledge base. Opinions may offend, be intentionally biased, sour the AI / human experience.
A relatively new medium of support for businesses small to global conglomerates becomes available based on the exciting yet embryonic [Chabot] / Digital Agent services. Amazon and Microsoft, among others, are diving into this transforming space. The coat of paint is still wet on Amazon Lex and Microsoft Cortana Skills. MSFT Cortana Skills Kit is not yet available to any/all developers, but has been opened to a select set of partners, enabling them to expand Cortana’s core knowledge set. Microsoft’s Bot Framework is in “Preview” phase. However, the possibilities are extensive, such as another tier of support for both of these companies, if they turn on their own knowledge repositories using their respective Digital Agents [Chabot] platforms.
Approach from Inception to Deployment
The curation and creation of knowledge content may occur with the definition of ‘Goals/Intents’ and their correlated human utterances which trigger the Goal Question and Answer (Q&A) dialog format. Classic Use Case. The question may provide an answer with text, images, and video.
Taking Goals/Intents and Utterances to ‘the next level’ involves creating / implementing Process Workflows (PW). A workflow may contain many possibilities for the user to reach their goal with a single utterance triggered. Workflows look very similar to what you might see in a Visio diagram, with multiple logical paths. Instead of presenting users with the answer based upon the single human utterance, the question, the workflow navigates the users through a narrative to:
disambiguate the initial human utterance, and get a better understanding of the specific user goal/intention. The user’s question to the Digital Agent may have a degree of ambiguity, and workflows enable the AI Digital Agent to determine the goal through an interactive dialog/inspection. The larger the volume of knowledge, and the closer the goals/intentions, the implementation would require disambiguation.
interactive conversation / dialog with the AI Digital Agent, to walk through a process step by step, including text, images, and Video inline with the conversation. The AI chat agent may pause the ‘directions’ waiting for the human counterpart to proceed.
Amazon to provide billing and implementation / technical support for AWS services through a customized version of their own AWS Lex service? All the code used to provide this Digital Agent / Chabot maybe ‘open source’ for those looking to implement similar [enterprise] services.
Digital Agent may allow the user to share their screen, OCR the current section of code from an IDE, and perform a code review on the functions / methods.
Microsoft has an ‘Online Chat’ capability for MSDN. Not sure how extensive the capability is, and if its a true 1:1 chat, which they claim is a 24/7 service. Microsoft has libraries of content from Microsoft Docs, MSDN, and TechNet. If the MSFT Bot framework has the capability to ingest their own articles, users may be able to trigger these goals/intents from utterances, similar to searching for knowledge base articles today.
Abstraction, Abstraction, Abstraction. These AI Chatbot/Digital Agents must float toward Wizards to build and deploy, and attempt to stay away from coding. Elevating this technology to be configurable by a business user. Solutions have significant possibilities for small companies, and this technology needs to reach their hands. It seems that Amazon Lex is well on their way to achieving the wizard driven creation / distribution, but have ways to go. I’m not sure if the back end process execution, e.g. Amazon Lambda, will be abstracted any time soon.
Interesting approach to an AI Chatbot implementation. The business process owner creates one or more Google Forms containing questions and answers, and converts/deploys to a chatbot using fobi.io. All the questions for [potential] customers/users are captured in a multitude of forms. Without any code, and within minutes, an interactive chatbot can be produced and deployed for client use.
The trade off for rapid deployment and without coding is a rigid approach of triggering user desired “Goal/Intents”. It seems a single goal/intent is mapped to a single Google Form. As opposed to a digital agent, which leverages utterances to trigger the user’s intended goal/intent. Before starting the chat, the user must select the appropriate Google Form, with the guidance of the content curator.
Another trade off is, it seems, no integration on the backend to execute a business process, essential to many chatbot workflows. For example, given an Invoice ID, the chatbot may search in a transactional database, then retrieve and display the full invoice. Actually, I may be incorrect. On the Google Forms side, there is a Script Editor. Seems powerful and scary all at the same time.
Another trade off that seems to exist, more on the Google Forms side, is building not just a Form with a list of Questions, but a Consumer Process Workflow, that allows the business to provide an interactive dialog based on answers users provide. For example, a Yes/No or multichoice answer may lead to alternate sets of questions [and actions]. It doesn’t appear there is any workflow tool provided to structure the Google Forms / fobi.io chatbot Q&A.
However, there are still many business cases for the product, especially for small to mid size organizations.
* Business Estimates – although there is no logic workflow to guide the Q&A sessions with [prospective] customers, the business still may derive the initial information they require to make an initial assessment. It seems a Web form, and this fobi.io / Google Forms solution seems very comparable in capability, its just a change in the median in which the user interacts to collect the information.
One additional note, Google Forms is not a free product. Looks like it’s a part of the G Suite. Free two week trial, then the basic plan is $5 per month, which comes with other products as well. Click here for pricing details.
Although this “chatbot” tries to quickly provide a mechanism to turn a form to a chatbot, it seems it’s still just a form at the end of the day. I’m interested to see more products from Zoi.ai soon