Tag Archives: Voice Recognition

Hostess with the Mostest – Apple Siri, Amazon Alexa, Microsoft Cortana, Google Assistant

Application Integration Opportunities:

  • Microsoft Office, Google G Suite, Apple iWork
    • Advice is integrated within the application, proactive and reactive: When searching in Microsoft Edge, a blinking circle representing Cortana is illuminated.  Cortana says “I’ve collected similar articles on this topic.”  If selected, presents 10 similar results in a right panel to help you find what you need.
  • Personal Data Access and Management
    • The user can vocally access their personal data, and make modifications to that data; E.g. Add entries to their Calendar, and retrieve the current day’s agenda.

Platform Capabilities: Mobile Phone Advantage

Strengthen core telephonic capabilities where competition, Amazon and Microsoft, are relatively week.

  • Ability to record conversations, and push/store content in Cloud, e.g. iCloud.  Cloud Serverless recording mechanism dynamically tags a conversations with “Keywords” creating an Index to the conversation.  Users may search recording, and playback audio clips +/- 10 seconds before and after tagged occurrence.
Calls into the User’s Smartphones May Interact Directly with the Digital Assistant
  • Call Screening – The digital assistant asks for the name of the caller, purpose of the call, and if the matter is “Urgent”
    • A generic “purpose” response, or a list of caller purpose items can be supplied to the caller, e.g. 1) Schedule an Appointment
    • The smartphone’s user would receive the caller’s name, and the purpose as a message back to the UI from the call, currently in a ‘hold’ state,
    • The smartphone user may decide to accept the call, or reject the call and send the caller to voice mail.
  • A  caller may ask to schedule a meeting with the user, and the digital assistant may access the user’s calendar to determine availability.  The digital assistant may schedule a ‘tentative’ appointment within the user’s calendar.
    • If calendar indicates availability, a ‘tentative’ meeting will be entered. The smartphone user would have a list of tasks from the assistant, and one of the tasks is to ‘affirm’ availability of the meetings scheduled.
  • If a caller would like to know the address of the smartphone user’s office, the Digital Assistant may access a database of “generally available” information, and provide it. The Smartphone user may use applications like Google Keep, and any note tagged with a label “Open Access” may be accessible to any caller.
  • Custom business workflows may be triggered through the smartphone, such as “Pay by Phone”.  When a caller is calling a business user’s smartphone, the call goes to “voice mail” or “digital assistant” based on smartphone user’s configuration.  If the user reaches the “Digital Assistant”, there may be a list of options the user may perform, such as “Request for Service” appointment.  The caller would navigate through a voice recognition, one of many defined by the smartphone users’ workflows.

Platform Capabilities: Mobile Multimedia

Either through your mobile Smartphone, or through a portable speaker with voice recognition (VR).

  • Streaming media / music to portable device based on interactions with Digital Assistant.
  • Menu to navigate relevant (to you) news,  and Digital Assistant to read articles through your portable media device (without UI)

Third Party Partnerships: Adding User Base, and Expanding Capabilities

In the form of platform apps (abstraction), or 3rd party APIs which integrate into the Digital Assistant, allowing users to directly execute application commands, e.g. Play Spotify song, My Way by Frank Sinatra.

  • Any “Skill Set” with specialized knowledge: direct Q&A or instructional guidance  – e.g Home Improvement, Cooking
  • eCommerce Personalized Experience – Amazon
  • Home Automation – doors, thermostats
  • Music – Spotify
  • Navigate Set Top Box (STB) – e.g. find a program to watch
  • Video on Demand (VOD) – e.g. set to record entertainment


Bubble Head Bob Takes Dictation: Human Factors to Voice Recognition

As I continue to ponder why I am so adverse to text to speech, voice recognition, other than the issues I have with its inconsistent accuracy, needing to speak slowly at times, as well as crystal clearly articulate every word, without deviation, barring those issues, I still am having trouble relating to a machine.  Hold the jokes from the peanut gallery.  Yes, my Android phone, I find it difficult to talk to an app with a microphone on a screen, or an image of a piece of paper on a technical device.  Frankly, if the technical  issues went away, maybe, just maybe, I might talk to my phone.  I may be xenophobic to androids or a computer, robot, taking dictation.  I’ll have to add that to the list to talk to my therapist.  Is there an application anyone knows that has a bobble head, of various faces that will ask the questions, take, and repeat your last phrase to confirm, such as did you say, ‘that’ or ‘fat’, I don’t understand the context of the sentence. 

Add, a familiar, animated face to whom you would speak, as well as periodic word and sentence validation feature to Voice Recognition, text to speech dictation applications, and there might be more acceptance of these applications.  Too soon, too quick?

China’s Baidu Digital Eyewear Targeted Solely for Government

China’s Baidu developing digital eyewear similar to Google Glass | Reuters.

Three paragraphs are extremely interesting, and imply military applications as well as policing their own people.

Kuo said the device will be mounted on a headset with a small LCD screen and will allow users to make image and voice searches as well as conduct facial recognition matches.


“What you are doing with your camera, for example, taking a picture of a celebrity and then checking on our database to see if we have a facial image match, you could do the same thing with a wearable visual device,” Kuo said.


We haven’t decided whether it is going to be released in any commercial form right now, but we experiment with every kind of technology that is related to search,” Kuo said. Kuo declined to comment on the other functions of the Baidu Eye or whether Baidu is working on other forms of wearable technology.

It implyies that targeted people who are targeted for ‘crimes’ such as civil disobedience, may be tracked in a database.  The last paragraph implies that the technology may be targeted for the ‘public’ / government sector use.  In addition, all governments may use this technologies at their borders easier recognition of targeted individuals.  I could also visualize other highly policed states, where terrorism is very active, to provide these glasses to transportation gatekeepers, such as bus drivers, or train conductor, where at the point of collecting tickets, they may be able to perform retinal recognition, and allow the collection of fees, depending on the accuracy of the technology, as well as identify them for any outstanding warrents for arrest.  A person may board a bus, and by identifying the person through facial, retinal, and/or voice recognition, if cleared a security check, the bus driver may ask automatically, would you like this fare deducted from your linked checking, or which credit card, ending in the last for digits.

This technology might eventually be mandated by the states within the EU.  That’s a thought, as well as the requirements to connect each border check to cross reference with Interpol, the World Health Organization (WHO) for the spread of possible infectious disease control, as well as local government warrents.

Brave New World.

Samsung and Cambridge to Produce Interactive Avatar

BBC News – Is this interactive avatar the face of the future?.

I read this article, and instantly saw a logical progression of taking the eye & facial tracking software, such as built in Samsung S4, and integrating that feature with a cost effective version of the Cambridge project.  There are many applications:

  • The S Voice Drive, or another voice recognition component driving smartphone features may display, instead of the typical microphone, a ‘friendly’ avatar, such as one of several choices, e.g. a famous star, a comedian  an actress, or sports athlete.   Then the eye and facial tracking software may ask you what you want smartphone functions you want to perform.
  • An AI induction engine, i.e. an learning rules engine, may record your facial gestures, eye movements, as well as sounds, even inflection, as data points to correlate, so now the responses can be proactive, not reactive, e.g. the avatar would say, “Should I call your wife?   You seem tense, and you may want to call her to relax you.”
  • This is a slippery slope with respect to an AI providing advice on how to react to human output, such as eye movements and facial gestures.  It seems people are, at present, more comfortable with integrating mechanical AI induction engines, such as an eye movement to turn a page, read mail or make a phone call.  These very mechanical processes and allow people to feel more comfortable with the technology.

Actor Voice Overs for Apple’s Siri and Google’s AI Voice

There should be add on, for purchase, modification packs, for the smartphone’s Artificial Intelligence (AI) Voice Command module, e.g. SIRI, or Google’s AI Voice Recognition, this way any retired, or semi retired actor gets easy money going into a studio recording their well known voice, and royalties go to the companies that are selling the AI voice over modification package, so instead of the standard woman’s voice, you can get anyone who would sell their voice.  It should be extremely easy to implement, the voice is probably a vocabulary indexed with certain words.  The actors or actresses need to record their voices saying a specific set of words as defined by the library of words, and I hope the Android and Apple operating system companies thought ahead to make these libraries like plug and play adaptable.  If Apple or Google made the AI Voice API open so anyone can record their voice and interchange with the default voice, actors, actresses, musicians, or anyone can sell their voice in the Mobile Vendor’s marketplace as an add on module.