The holiday season brings lots of people to your front door. If you have a front door camera, you may be getting many alerts from your front door that let you know there is motion at the door. It would be great if the front doorbell cameras could take the next step and incorporate #AI facial/image recognition and notify you through #iOS notifications WHO is at the front door and, in some cases, which “uniformed” person is at the door, e.g. FedEx/UPS delivery person.
This facial recognition technology is already baked into Microsoft #OneDrive Photos and Apple #iCloud Photos. It wouldn’t be a huge leap to apply facial and object recognition to catalog the people who come to your front door as well as image recognition for uniforms that they are wearing, e.g., UPS delivery person.
iCloud/OneDrive Photos identify faces in your images, group by likeness, so the owner of the photo gallery can identify this group of faces as Grandma, for example. It may take one extra step for the camera owner to login into the image/video storage service and classify a group of videos converted to stills containing the face of Grandma. Facebook Meta also can tag the faces within pictures you upload and share. The Facebook app also can “guess” faces based on previously uploaded images.
No need to launch the Ring app and see who’s at the front door. Facial recognition can remove the step required to find out what is the motion at the front door and just post the iOS notification with the “who’s there”.
One less step to launching the Ring app and see who is at the front door.
I’ve been enamored with Bose products for well over a decade. However, we’ve seen quality brands enter the hi-fidelity audio market over that time. Beyond quality design in their classic audio products, can Bose Augmented Reality (Bose AR) be the market differentiator?
Bose: Using a Bose-AR-equipped wearable, a smartphone, and an app-enabled with Bose AR, the new platform lets you hear what you see.
It sounds like Bose may come up with an initial design, sunglasses, but turn to 3rd party hardware manufacturers of all sorts to integrate Bose AR into other wearable products.
Bose Augmented Reality isn’t just about audio. The devices will use sensors to track head motions for gesture controls and work with GPS from a paired smartphone to track location. The company also aspires to combine visual information with the Bose AR platform.
Bose AR Use Cases
Bose Augmented Reality device reenact historical events or speeches from landmarks and statues as you visit them.
The Bose and NFL partnership could be leveraged to get these AR units into the football player’s helmets. Audio queues from the on-field lead, quarterback, and dynamically replayed/relayed at the appropriate time of required action by the receiver.
Audio directions to your gate when your GPS detects that you’ve arrived at the airport, or any other destination from your calendar. Audio queues would be richer the more inclusive you are to the access to Calendars, To Do lists, etc.
Combine visual information with the Bose AR platform, too, so you could hear a translation of a sign you’re looking at.
Hear the history of a painting in a museum.
Time until it’s in consumer’s hands? TBD. Bose objective is to have the developer kit, including a pair of glasses, available later this year.
When I was on vacation in Athens, Greece, I created a post which had Greek actors running tours in their ancient, native garb. The Bose AR could be a complementary offering to the tour, which includes live, greek local actors portraying out scenes in ancient ruins. Record the scenes, and interact with them while walking through the Greek ruins in your Bose AR (Augmented Reality) glasses.
I’m a cheerleader for Bose, among several others in this space, but I question a Bose AR headset that produces a high fidelity sound. Most of the use cases listed should be able to “get along OK” with an average quality sound. Maybe high definition AR games with a high level of realism might benefit from the high-quality sound. However, their site reads like Bose is positioning themselves as a component to be integrated into other AR headsets, i.e. “Bose-AR-equipped wearable“
The ultimate goal, in my mind, is to have the capability within a Search Engine to be able to upload an image, then the search engine analyzes the image, and finds comparable images within some degree of variation, as dictated in the search properties. The search engine may also derive metadata from the uploaded image such as attributes specific to the image object(s) types. For example, determine if a person [object] is “Joyful” or “Angry”.
As of the writing of this article, search engines Yahoo and Microsoft Bing do not have the capability to upload an image and perform image/pattern recognition, and return results. Behold, Google’s search engine has the ability to use some type of pattern matching, and find instances of your image across the world wide web. From the Google Search “home page”, select “Images”, or after a text search, select the “Images” menu item. From there, an additional icon appears, a camera with the hint text “Search by Image”. Select the Camera icon, and you are presented with options on how Google can acquire your image, e.g. upload, or an image URL.
Select the “Upload an Image” tab, choose a file, and upload. I used a fictional character, Max Headroom. The search results were very good (see below). I also attempted an uncommon shape, and it did not meet my expectations. The poor performance of matching this possibly “unique” shape is mostly likely due to how the Google Image Classifier Model was defined, and correlating training data that tested the classifier model. If the shape is “Unique” the Google Search Image Engine did it’s job.
Google Image Search Results – Max Headroom
Google Image Search Results – Odd Shaped Metal Object
The Google Search Image Engine was able to “Classify” the image as “metal”, so that’s good. However I would have liked to see better matches under the “Visually Similar Image” section. Again, this is probably due to the image classification process, and potentially the diversity of image samples.
A Few Questions for Google
How often is the Classifier Modeling process executed (i.e. training the classifier), and the model tested? How are new images incorporated into the Classifier model? Are the user uploaded images now included in the Model (after model training is run again)? Is Google Search Image incorporating ALL Internet images into Classifier Model(s)? Is an alternate AI Image Recognition process used beyond Classifier Models?
I’m not sure if the Cloud Vision API uses the same technology as Google’s Search Image Engine, but it’s worth noting. After reaching the Cloud Vision API starting page, go to the “Try the API” section, and upload your image. I tried a number of samples, including my odd shaped metal, and I uploaded the image. I think it performed fairly well on the “labels” (i.e. image attributes)
Using the Google Cloud Vision API, to determine if there were any WEB matches with my odd shaped metal object, the search came up with no results. In contrast, using Google’s Search Image Engine produced some “similar” web results.
Finally, I tested the Google Cloud Vision API with a self portrait image. THIS was so cool.
The API brought back several image attributes specific to “Faces”. It attempts to identify certain complex facial attributes, things like emotions, e.g. Joy, and Sorrow.
The API brought back the “Standard” set of Labels which show how the Classifier identified this image as a “Person”, such as Forehead and Chin.
Finally, the Google Cloud Vision API brought back the Web references, things like it identified me as a Project Manager, and an obscure reference to Zurg in my Twitter Bio.
The Google Cloud Vision API, and their own baked in Google Search Image Engine are extremely enticing, but yet have a ways to go in terms of accuracy %. Of course, I tried using my face in the Google Search Image Engine, and looking at the “Visually Similar Images” didn’t retrieve any images of me, or even a distant cousin (maybe?)
Review of Microsoft OneDrive Cloud Repository. It may be an easy tool and service(s) to save files. If you know what you roughly want to find, most Cloud repositories are easy and straight forward to use. Over time, if not managed appropriately, the cloud repository becomes burdensome to manage, e.g. access and find files. If stuck in the “file folder organization storage” mentality of organizing our content, our Cloud storage solution will become quickly unyielding. Getting into habits like tagging your content should help us to access files beyond the “Folder Borders”. To the contrary, there are huge opportunities to leverage and grow existing platforms, specifically around the process service of [file] Ingestion.
Bulk file loading, e.g. photos from our smartphones, maybe the entire family uploads to the same storage repository
If performed by the “Ingestion Service”, manual user “tagging” of a group of photos, or individual images may be available.
Geotagging may be available either at the time of image capture , or upon the start of the “Ingestion Service”
Facial Recognition, compared to the likes of services such as Facebook, based on my experience, are not readily available to personal Cloud Storage repositories.
Auto tagging pictures upon ingestion, if performed, may leverage “Extracted Text” from images. Images become searchable with little human intervention.
Cloud File Repository: Storing Content
I created modified existing Microsoft Office files”tags”, in this case MS Word and PowerPoint file types were used. I opened the Word file, and selected “File” menu, “Save As” menu, then “More Options” under the list of file types. I was then presented with the classic “Save As” form. Just below the “Save as type” list box, there were 3 “metadata” fields to describe the file:
The first two fields are semi colon ; delimited and multiple values are allowed. In this test case, I added to the “Tags” field “CV;resume;career”. I then used the MS Windows Snipping Tool that comes with the OS to document the step. I called the file MSWordTags.PNG and saved this screen capture to my OneDrive. Then I saved the document itself on my OneDrive.
Cloud File Repository: Finding Content
I then started up Internet Explorer, and went to the https://onedrive.live.com site to access my cloud content. On the top left corner of the screen, there is a field called “Search Everything”, and I typed in CV.
The search results included ONLY the image screenshot file that contained the letters CV, and not the MS Word file that explicitly had the Tag field with the text value CV.
Looking at the file properties as defined by OneDrive, there was ALSO a field called “Tags” with no values populated. For example, the Cloud “Ingestion” service did not read the file for metadata, and abstract it to the Cloud level. just two separate sets of metadata describing the same file. To view the Cloud file data, select the file, and there is an i with a circle around it. Too many ways to store the same data, and may lead to inconsistent data.
For the Cloud file information / properties, the image file had a field called “Extracted Text”, and this is how the search picked up the CV value in the Cloud Search for my files with the “CV” tag.
Oddly, the MS Word file attributes in OneDrive did not offer “tags” as a field to store meta data in the cloud. The “tags” field was available when looking at the PNG file. However, the user may add a “Description” in a multiline text field. Tags metadata on images and not MS Word files? Odd.
Future State (?): If the Cloud Ingestion process can perform an “Extracted Text” process, it may also have other “Ingestion services”, such as “Facial Recognition” from “known good” faces already tagged. e.g. I tag a face from within the OneDrive browser UI, and now when other images are ingested, there can be a correlation between the files.
As a business model, are we going to add a tier just after Cloud File ingestion, maybe exercise a third party suite of cognitive APIs, such as facial recognition? For example, Microsoft OneDrive Ingests a file, and if it’s an image file, routes through to the appropriate IBM Watson API, processes the file, and returns [updated] metadata, and a modified file? Maybe.
Update: Auto Tagging Objects Upon Ingestion
On an image with no tags, I selected the “Edit tags” menu from the Properties pane on the right side of the screen. As a scrolling menu, the option to “Add existing tag” appeared. There were dozens of tags already created with a word, thumbnail image, and the number of times used. Wow. Awesome. The current implementation seems to automatically, upon ingestion, identify objects in the image, and tag the images with those objects, e.g. Building, Beach, Horse, etc.
Presumption that Microsoft OneDrive performs object recognition on images upon file ingestion into the cloud (as opposed to in the Photos app).
“Extracted Text ” Metadata Field from within Microsoft OneDrive Image PNG File Properties:
Presumption that Microsoft OneDrive performs OCR on images upon file ingestion into the cloud (as opposed to the Photos app).
If Facebook uses facial recognition, why not expand to cover vendor / partner library catalogs, use the AI Image Recognition to identify objects, and ‘read’ and recognize simple phrases from the captions or comments of pictures. If the caption says ‘nice dress’, you can use the AI image recognition rules engine to suggest N list of vendors, local and web, with the lowest price, and ‘Buy Now’ if you ‘Like’ the picture.