Tag Archives: Image Recognition

Platform Independent AI Model for Images: AI Builder, Easily Utilized by 3rd Party Apps

With all the discourse on OpenAI’s ChatGPT and Natural language processing (NLP), I’d like to steer the conversation toward images/video and object recognition. This is another area in artificial intelligence primed for growth with many use cases. Arguably, it’s not as shocking, bending our society at its core, creating college papers with limited input, but Object Recognition can seem “magical.” AI object recognition may turn art into science, as easy as AI reading your palm to tell your future. AI object recognition will bring consumers more data points from which Augmented Reality (AR) overlays digital images within an analog world of tangible objects.

Microsoft’s AI Builder – Platform Independent

Microsoft’s Power Automate AI [model] Builder has the functionality to get us started on the journey of utilizing images, tagging them with objects we recognize, and then training the AI model to recognize objects in our “production” images. Microsoft provides tools to build AI [image] models (library of images with human, tagged objects) quickly and easily. How you leverage these AI models is the foundation of “future” applications. Some applications are already here, but not mass production. The necessary ingredient: taking away the proprietary building of AI models, such as in social media applications.

In many social media applications, users can tag faces in their images for various reasons, mostly who to share their content/images with. In most cases, images can also be tagged with a specific location. Each AI image/object model is proprietary and not shared between social media applications. If there was a standards body, an AI model could be created/maintained outside of the social media applications. Portable AI object recognition models with a wide array of applications that support it’s use, such as social media applications. Later on, we’ll discuss Microsoft’s AI Model builder, externalized from any one application, and because it’s Microsoft, it’s intuitive. 🙂

An industry standards body could collaborate and define what AI models look like their features, and most importantly, the portability formats. Then the industry, such as social media apps, can elect to adopt features that are and are not supported by their applications.

Use Cases for Detecting Objects in Images

Why doesn’t everyone have an AI model containing tagged objects within images and videos of the user’s design? Why indeed.

1 – Brands / Product Placement from Content Creators

Just about everyone today is a content creator, producing images and videos for their own personal and business social media feeds, Twitter, Instagram, Snap, Meta, YouTube, and TikTok, to name a few. AI models should be portable enough to integrate with social media applications where tags could be used to identify branded apparel, jewelry, appliances, etc. Tags could also contain metadata, allowing content consumers to follow tagged objects to a specified URL. Clicks and the promotion of products and services.

2 – Object Recognition for Face Detection

Has it all been done? Facebook/Meta, OneDrive, iCloud, and other services have already tried or are implementing some form of object detection in the photos you post. Each of these existing services implements object detection at some level:

  • Identify the faces in your photos, but need you to tag those faces and some “metadata” will be associated with these photos
  • Dynamically grouping/tagging all “Portrait” pictures of a specific individual or events from a specific day and location, like a family vacation.
  • Some image types, JPEGs, PNG, GIF, etc., allow you to add metadata to the files on your own, e.g. so you can search for pictures on the OS level of implementation.
3 – Operational Assistance through object recognition using AR
  • Constructing “complex” components in an assembly line where Augmented Reality (AR) can overlay the next step in assembly with the existing object to help transition the object to the next step in assembly.
  • Assistance putting together IKEA furniture, like the assembly line use case, but for home use.
  • Gaming, everything from Mario Kart Live to Light Saber duels against the infamous Darth Vader.
4 – Palm Reading and other Visual Analytics
  • Predictive weather patterns
5 – Visual Search through Search Engines and Proprietary Applications with Specific Knowledge Base Alignment
  • CoinSnap iPhone App scans both sides of the coin and then goes on to identify the coin, building a user’s collection.
  • Microsoft Bing’s Visual Search and Integration with MSFT Edge
  • Medical Applications, Leveraging AI, e.g., Image Models – Radiology
Radiology – Reading the Tea Leaves

Radiology builds a model of possible issues throughout the body. Creating images with specific types of fractures can empower the autodetection of any issues with the use of AI. If it was a non-proprietary model, radiologists worldwide could contribute to that AI model. The displacement of radiology jobs may inhibit the open non-proprietary nature of the use case, and the AI model may need to be built independently of open input from all radiologists.

Microsoft’s AI Builder – Detect Objects in Images

Microsoft’s AI model builder can help the user build models in minutes. Object Detection, Custom Model, Detect custom objects in images is the “template” you want to use to build a model to detect objects, e.g. people, cars, anything, rather quickly, and can enable users to add images (i.e. train model) to become a better model over time.

Many other AI Model types exist, such as Text Recognition within images. I suggest exploring the Azure AI Models list to fit your needs.

Current, Available Data Sources for Image Input

  • Current Device
  • SharePoint
  • Azure BLOB

Wish List for Data Sources w/Trigger Notifications

When a new image is uploaded into one of these data sources, a “trigger” can be activated to process the image with the AI Model and apply tags to the images.

  • ADT – video cam
  • DropBox
  • Google Drive
  • Instagram
  • Kodak (yeah, still around)
  • Meta/Facebook
  • OneDrive
  • Ring -video cam
  • Shutterfly
  • Twitter

Get Started: Power Automate, Premium Account

Login to Power Automate with your premium account, and select “AI Builder” menu, then the “Models” menu item. The top left part of the screen, select “New AI Model,” From the list of model types, select “Custom Model, Object Detection”Detect Custom Objects in Images.”

AI Builder - Custom Model
AI Builder – Custom Model

It’s a “Premium” feature of Power Automate, so you must have the Premium license. Select “Get Started”,. The first step is to “Select your model’s domain”, there are three choices, so I selected “Common Objects” to give me the broadest opportunity. Then select “Next”.

AI Builder - Custom Model - Domain
AI Builder – Custom Model – Domain

Next, you need to select all of the objects you want to identify in your images. For demonstration purposes, I added my family’s first names as my objects to train my model to identify in images.

AI Builder - Custom Model - Objects for Model
AI Builder – Custom Model – Objects for Model

Next, you need to “Add example images for your objects.” Microsoft’s guidance is “You need to add at least 15 images for each object you want to detect.” Current data sources include:

Add Images
AI Model – Add Images

I added the minimum recommended images, 15 per object, two objects, 30 images of my family, and random pics over the last year.

Once uploaded, you need to go through each image, draw a box around the image’s objects you want to tag, and then select the object tag.

Part 2 – Completing the Model and its App usage.

Who’s at the Front Door…Again?

Busy Time of Year, Happy Holidays

The holiday season brings lots of people to your front door. If you have a front door camera, you may be getting many alerts from your front door that let you know there is motion at the door. It would be great if the front doorbell cameras could take the next step and incorporate #AI facial/image recognition and notify you through #iOS notifications WHO is at the front door and, in some cases, which “uniformed” person is at the door, e.g. FedEx/UPS delivery person.

RIng iOS Notification
RIng iOS Notification

This facial recognition technology is already baked into Microsoft #OneDrive Photos and Apple #iCloud Photos. It wouldn’t be a huge leap to apply facial and object recognition to catalog the people who come to your front door as well as image recognition for uniforms that they are wearing, e.g., UPS delivery person.

iCloud/OneDrive Photos identify faces in your images, group by likeness, so the owner of the photo gallery can identify this group of faces as Grandma, for example. It may take one extra step for the camera owner to login into the image/video storage service and classify a group of videos converted to stills containing the face of Grandma. Facebook Meta also can tag the faces within pictures you upload and share. The Facebook app also can “guess” faces based on previously uploaded images.

No need to launch the Ring app and see who’s at the front door. Facial recognition can remove the step required to find out what is the motion at the front door and just post the iOS notification with the “who’s there”.

One less step to launching the Ring app and see who is at the front door.

Bose AR, Audio Augmented Reality – Use Cases

I’ve been enamored with Bose products for well over a decade. However,  we’ve seen quality brands enter the hi-fidelity audio market over that time.  Beyond quality design in their classic audio products, can Bose Augmented Reality (Bose AR) be the market differentiator?

Bose: Using a Bose-AR-equipped wearable, a smartphone, and an app-enabled with Bose AR, the new platform lets you hear what you see.

It sounds like Bose may come up with an initial design, sunglasses, but turn to 3rd party hardware manufacturers of all sorts to integrate Bose AR into other wearable products.

Bose Augmented Reality isn’t just about audio. The devices will use sensors to track head motions for gesture controls and work with GPS from a paired smartphone to track location.  The company also aspires to combine visual information with the Bose AR platform.

Bose AR Use Cases

  • Bose Augmented Reality device reenact historical events or speeches from landmarks and statues as you visit them.
  • The Bose and NFL partnership could be leveraged to get these AR units into the football player’s helmets.  Audio queues from the on-field lead, quarterback, and dynamically replayed/relayed at the appropriate time of required action by the receiver.
  • Audio directions to your gate when your GPS detects that you’ve arrived at the airport, or any other destination from your calendar.  Audio queues would be richer the more inclusive you are to the access to Calendars, To Do lists, etc.
  • Combine visual information with the Bose AR platform, too, so you could hear a translation of a sign you’re looking at.
  • Hear the history of a painting in a museum.

Time until it’s in consumer’s hands?  TBD.  Bose objective is to have the developer kit, including a pair of glasses, available later this year.

When I was on vacation in Athens, Greece, I created a post which had Greek actors running tours in their ancient, native garb.  The Bose AR could be a complementary offering to the tour, which includes live, greek local actors portraying out scenes in ancient ruins.  Record the scenes, and interact with them while walking through the Greek ruins in your Bose AR (Augmented Reality) glasses.

Greece, Prosperity, and Taxes: The World Will Come See You in AR

Please take a moment to prioritize the use cases, or add your own.

Takeaway

I’m a cheerleader for Bose, among several others in this space, but I question a Bose AR headset that produces a high fidelity sound. Most of the use cases listed should be able to “get along OK” with an average quality sound.  Maybe high definition AR games with a high level of realism might benefit from the high-quality sound. However, their site reads like Bose is positioning themselves as a component to be integrated into other AR headsets, i.e. “Bose-AR-equipped wearable

Google Search Enables Users to Upload Images for Searching with Visual Recognition. Yahoo and Bing…Not Yet

The ultimate goal, in my mind, is to have the capability within a Search Engine to be able to upload an image, then the search engine analyzes the image, and finds comparable images within some degree of variation, as dictated in the search properties.  The search engine may also derive metadata from the uploaded image such as attributes specific to the image object(s) types.  For example,  determine if a person [object] is “Joyful” or “Angry”.

As of the writing of this article,  search engines Yahoo and Microsoft Bing do not have the capability to upload an image and perform image/pattern recognition, and return results.   Behold, Google’s search engine has the ability to use some type of pattern matching, and find instances of your image across the world wide web.    From the Google Search “home page”, select “Images”, or after a text search, select the “Images” menu item.  From there, an additional icon appears, a camera with the hint text “Search by Image”.  Select the Camera icon, and you are presented with options on how Google can acquire your image, e.g. upload, or an image URL.

Google Search Upload Images
Google Search Upload Images

Select the “Upload an Image” tab, choose a file, and upload.  I used a fictional character, Max Headroom.   The search results were very good (see below).   I also attempted an uncommon shape, and it did not meet my expectations.   The poor performance of matching this possibly “unique” shape is mostly likely due to how the Google Image Classifier Model was defined, and correlating training data that tested the classifier model.  If the shape is “Unique” the Google Search Image Engine did it’s job.

Google Image Search Results – Max Headroom

Max Headroom Google Search Results
Max Headroom Google Search Results

 

Google Image Search Results – Odd Shaped Metal Object

Google Search Results - Odd Shaped Metal Object
Google Search Results – Odd Shaped Metal Object

The Google Search Image Engine was able to “Classify” the image as “metal”, so that’s good.  However I would have liked to see better matches under the “Visually Similar Image” section.  Again, this is probably due to the image classification process, and potentially the diversity of image samples.

A Few Questions for Google

How often is the Classifier Modeling process executed (i.e. training the classifier), and the model tested?  How are new images incorporated into the Classifier model?  Are the user uploaded images now included in the Model (after model training is run again)?    Is Google Search Image incorporating ALL Internet images into Classifier Model(s)?  Is an alternate AI Image Recognition process used beyond Classifier Models?

Behind the Scenes

In addition, Google has provided a Cloud Vision API as part of their Google Cloud Platform.

I’m not sure if the Cloud Vision API uses the same technology as Google’s Search Image Engine, but it’s worth noting.  After reaching the Cloud Vision API starting page, go to the “Try the API” section, and upload your image.  I tried a number of samples, including my odd shaped metal, and I uploaded the image.  I think it performed fairly well on the “labels” (i.e. image attributes)

Odd Shaped Metal Sample Image
Odd Shaped Metal Sample Image

Using the Google Cloud Vision API, to determine if there were any WEB matches with my odd shaped metal object, the search came up with no results.  In contrast, using Google’s Search Image Engine produced some “similar” web results.

Odd Shaped Metal Sample Image Web Results
Odd Shaped Metal Sample Image Web Results

Finally, I tested the Google Cloud Vision API with a self portrait image.  THIS was so cool.

Google Vision API - Face Attributes
Google Vision API – Face Attributes

The API brought back several image attributes specific to “Faces”.  It attempts to identify certain complex facial attributes, things like emotions, e.g. Joy, and Sorrow.

Google Vision API - Labels
Google Vision API – Labels

The API brought back the “Standard” set of Labels which show how the Classifier identified this image as a “Person”, such as Forehead and Chin.

Google Vision API - Web
Google Vision API – Web

Finally, the Google Cloud Vision API brought back the Web references, things like it identified me as a Project Manager, and an obscure reference to Zurg in my Twitter Bio.

The Google Cloud Vision API, and their own baked in Google Search Image Engine are extremely enticing, but yet have a ways to go in terms of accuracy %.  Of course,  I tried using my face in the Google Search Image Engine, and looking at the “Visually Similar Images” didn’t retrieve any images of me, or even a distant cousin (maybe?)

Google Image Search Engine: Ian Face Image
Google Image Search Engine: Ian Face Image

 

Facebook Gifts Modified: ‘Like’ a Pic with caption ‘Nice Dress’, AI suggests, Buy Now, and presents vendors.

If Facebook uses facial recognition, why not expand to cover vendor / partner library catalogs, use the AI Image Recognition to identify objects, and ‘read’ and recognize simple phrases from the captions or comments of pictures.  If the caption says ‘nice dress’, you can use the AI image recognition rules engine to suggest N list of vendors, local and web, with the lowest price, and ‘Buy Now’ if you ‘Like’ the picture.