Tag Archives: Object Recognition

Platform Independent AI Model for Images: AI Builder, Easily Utilized by 3rd Party Apps

With all the discourse on OpenAI’s ChatGPT and Natural language processing (NLP), I’d like to steer the conversation toward images/video and object recognition. This is another area in artificial intelligence primed for growth with many use cases. Arguably, it’s not as shocking, bending our society at its core, creating college papers with limited input, but Object Recognition can seem “magical.” AI object recognition may turn art into science, as easy as AI reading your palm to tell your future. AI object recognition will bring consumers more data points from which Augmented Reality (AR) overlays digital images within an analog world of tangible objects.

Microsoft’s AI Builder – Platform Independent

Microsoft’s Power Automate AI [model] Builder has the functionality to get us started on the journey of utilizing images, tagging them with objects we recognize, and then training the AI model to recognize objects in our “production” images. Microsoft provides tools to build AI [image] models (library of images with human, tagged objects) quickly and easily. How you leverage these AI models is the foundation of “future” applications. Some applications are already here, but not mass production. The necessary ingredient: taking away the proprietary building of AI models, such as in social media applications.

In many social media applications, users can tag faces in their images for various reasons, mostly who to share their content/images with. In most cases, images can also be tagged with a specific location. Each AI image/object model is proprietary and not shared between social media applications. If there was a standards body, an AI model could be created/maintained outside of the social media applications. Portable AI object recognition models with a wide array of applications that support it’s use, such as social media applications. Later on, we’ll discuss Microsoft’s AI Model builder, externalized from any one application, and because it’s Microsoft, it’s intuitive. 🙂

An industry standards body could collaborate and define what AI models look like their features, and most importantly, the portability formats. Then the industry, such as social media apps, can elect to adopt features that are and are not supported by their applications.

Use Cases for Detecting Objects in Images

Why doesn’t everyone have an AI model containing tagged objects within images and videos of the user’s design? Why indeed.

1 – Brands / Product Placement from Content Creators

Just about everyone today is a content creator, producing images and videos for their own personal and business social media feeds, Twitter, Instagram, Snap, Meta, YouTube, and TikTok, to name a few. AI models should be portable enough to integrate with social media applications where tags could be used to identify branded apparel, jewelry, appliances, etc. Tags could also contain metadata, allowing content consumers to follow tagged objects to a specified URL. Clicks and the promotion of products and services.

2 – Object Recognition for Face Detection

Has it all been done? Facebook/Meta, OneDrive, iCloud, and other services have already tried or are implementing some form of object detection in the photos you post. Each of these existing services implements object detection at some level:

  • Identify the faces in your photos, but need you to tag those faces and some “metadata” will be associated with these photos
  • Dynamically grouping/tagging all “Portrait” pictures of a specific individual or events from a specific day and location, like a family vacation.
  • Some image types, JPEGs, PNG, GIF, etc., allow you to add metadata to the files on your own, e.g. so you can search for pictures on the OS level of implementation.
3 – Operational Assistance through object recognition using AR
  • Constructing “complex” components in an assembly line where Augmented Reality (AR) can overlay the next step in assembly with the existing object to help transition the object to the next step in assembly.
  • Assistance putting together IKEA furniture, like the assembly line use case, but for home use.
  • Gaming, everything from Mario Kart Live to Light Saber duels against the infamous Darth Vader.
4 – Palm Reading and other Visual Analytics
  • Predictive weather patterns
5 – Visual Search through Search Engines and Proprietary Applications with Specific Knowledge Base Alignment
  • CoinSnap iPhone App scans both sides of the coin and then goes on to identify the coin, building a user’s collection.
  • Microsoft Bing’s Visual Search and Integration with MSFT Edge
  • Medical Applications, Leveraging AI, e.g., Image Models – Radiology
Radiology – Reading the Tea Leaves

Radiology builds a model of possible issues throughout the body. Creating images with specific types of fractures can empower the autodetection of any issues with the use of AI. If it was a non-proprietary model, radiologists worldwide could contribute to that AI model. The displacement of radiology jobs may inhibit the open non-proprietary nature of the use case, and the AI model may need to be built independently of open input from all radiologists.

Microsoft’s AI Builder – Detect Objects in Images

Microsoft’s AI model builder can help the user build models in minutes. Object Detection, Custom Model, Detect custom objects in images is the “template” you want to use to build a model to detect objects, e.g. people, cars, anything, rather quickly, and can enable users to add images (i.e. train model) to become a better model over time.

Many other AI Model types exist, such as Text Recognition within images. I suggest exploring the Azure AI Models list to fit your needs.

Current, Available Data Sources for Image Input

  • Current Device
  • SharePoint
  • Azure BLOB

Wish List for Data Sources w/Trigger Notifications

When a new image is uploaded into one of these data sources, a “trigger” can be activated to process the image with the AI Model and apply tags to the images.

  • ADT – video cam
  • DropBox
  • Google Drive
  • Instagram
  • Kodak (yeah, still around)
  • Meta/Facebook
  • OneDrive
  • Ring -video cam
  • Shutterfly
  • Twitter

Get Started: Power Automate, Premium Account

Login to Power Automate with your premium account, and select “AI Builder” menu, then the “Models” menu item. The top left part of the screen, select “New AI Model,” From the list of model types, select “Custom Model, Object Detection”Detect Custom Objects in Images.”

AI Builder - Custom Model
AI Builder – Custom Model

It’s a “Premium” feature of Power Automate, so you must have the Premium license. Select “Get Started”,. The first step is to “Select your model’s domain”, there are three choices, so I selected “Common Objects” to give me the broadest opportunity. Then select “Next”.

AI Builder - Custom Model - Domain
AI Builder – Custom Model – Domain

Next, you need to select all of the objects you want to identify in your images. For demonstration purposes, I added my family’s first names as my objects to train my model to identify in images.

AI Builder - Custom Model - Objects for Model
AI Builder – Custom Model – Objects for Model

Next, you need to “Add example images for your objects.” Microsoft’s guidance is “You need to add at least 15 images for each object you want to detect.” Current data sources include:

Add Images
AI Model – Add Images

I added the minimum recommended images, 15 per object, two objects, 30 images of my family, and random pics over the last year.

Once uploaded, you need to go through each image, draw a box around the image’s objects you want to tag, and then select the object tag.

Part 2 – Completing the Model and its App usage.

Politics around Privacy: Implementing Facial and Object Recognition

This Article is Not…

about deconstructing existing functionality of entire Photo Archive and Sharing platforms.

It is…

to bring an awareness to the masses about corporate decisions to omit the advanced capabilities of cataloguing photos, object recognition, and advanced metadata tagging.

Backstory: The Asks / Needs

Every day my family takes tons of pictures, and the pictures are bulk loaded up to The Cloud using Cloud Storage Services, such as DropBox, OneDrive,  Google Photos,  or iCloud.  A selected set of photos are uploaded to our favourite Social Networking platform (e.g. Facebook, Instagram, Snapchat,  and/or Twitter).

Every so often, I will take pause, and create either a Photobook or print out pictures from the last several months.  The kids may have a project for school to print out e.g. Family Portrait or just a picture of Mom and the kids.  In order to find these photos, I have to manually go through our collection of photographs from our Cloud Storage Services, or identify the photos from our Social Network libraries.

Social Networking Platform Facebook

As far as I can remember the Social Networking platform Facebook has had the ability to tag faces in photos uploaded to the platform.  There are restrictions, such as whom you can tag from the privacy side, but the capability still exists. The Facebook platform also automatically identifies faces within photos, i.e. places a box around faces in a photo to make the person tagging capability easier.  So, in essence, there is an “intelligent capability” to identify faces in a photo.  It seems like the Facebook platform allows you to see “Photos of You”,  but what seems to be missing is to search for all photos of Fred Smith, a friend of yours, even if all his photos are public.    By design, it sounds fit for the purpose of the networking platform.

Auto Curation

  1. Automatically upload new images in bulk or one at a time to a Cloud Storage Service ( with or without Online Printing Capabilities, e.g. Photobooks) and an automated curation process begins.
  2. The Auto Curation process scans photos for:
    1. “Commonly Identifiable Objects”, such as #Car, #Clock,  #Fireworks, and #People
    2. Auto Curation of new photos, based on previously tagged objects and faces in newly uploaded photos will be automatically tagged.
    3. Once auto curation runs several times, and people are manually #taged, the auto curation process will “Learn”  faces. Any new auto curation process executed should be able to recognize tagged people in new pictures.
  3. Auto Curation process emails / notifies the library owners of the ingestion process results, e.g. Jane Doe and John Smith photographed at Disney World on Date / Time stamp. i.e. Report of executed ingestion, and auto curation process.

Manual Curation

After upload,  and auto curation process, optionally, it’s time to manually tag people’s faces, and any ‘objects’ which you would like to track, e.g. Car aficionado, #tag vehicle make/model with additional descriptive tags.  Using the photo curator function on the Cloud Storage Service can tag any “objects” in the photo using Rectangle or Lasso Select.

Curation to Take Action

Once photo libraries are curated, the library owner(s) can:

  • Automatically build albums based one or more #tags
  • Smart Albums automatically update, e.g.  after ingestion and Auto Curation.  Albums are tag sensitive and update with new pics that contain certain people or objects.  The user/ librarian may dictate logic for tags.

Where is this Functionality??

Why are may major companies not implementing facial (and object) recognition?  Google and Microsoft seem to have the capability/size of the company to be able to produce the technology.

Is it possible Google and Microsoft are subject to more scrutiny than a Shutterfly?  Do privacy concerns at the moment, leave others to become trailblazers in this area?

Cloud Storage: Ingestion, Management, and Sharing

Cloud Storage Solutions need differentiation that matters, a tipping point to select one platform over the other.

Common Platforms Used:

Differentiation may come in the form of:

  • Collaborative Content Creation Software, such as DropBox Paper enables individuals or teams to produce content, all the while leveraging the Storage platform for e.g. version control,
  • Embedded integration in a suite of content creation applications, such as Microsoft Office, and OneDrive.
  • Making the storage solution available to developers, such as with AWS S3, and Box.  Developers may create apps powered by the Box Platform or custom integrations with Box
  • iCloud enables users to backup their smartphone, as well tightly integrating with the capture and sharing of content, e.g. Photos.

Cloud Content Lifecycle Categories:

  • Content Creation
    • 3rd Party (e.g. Camera) or Integrated Platform Products
  • Content Ingestion
    • Capture Content and Associated Metadata
  • Content Collaboration
    • Share, Update and Distribution
  • Content Discovery
    • Surface Content; Searching and Drill Down
  • Retention Rules
    • Auto expire pointer to content, or underlying content

Cloud Content Ingestion Services:

Cloud Ingestion Services
Cloud Ingestion Services

The Race Is On to Control Artificial Intelligence, and Tech’s Future

Amazon, Google, IBM and Microsoft are using high salaries and games pitting humans against computers to try to claim the standard on which all companies will build their A.I. technology.

In this fight — no doubt in its early stages — the big tech companies are engaged in tit-for-tat publicity stunts, circling the same start-ups that could provide the technology pieces they are missing and, perhaps most important, trying to hire the same brains.

For years, tech companies have used man-versus-machine competitions to show they are making progress on A.I. In 1997, an IBM computer beat the chess champion Garry Kasparov. Five years ago, IBM went even further when its Watson system won a three-day match on the television trivia show “Jeopardy!” Today, Watson is the centerpiece of IBM’s A.I. efforts.

Today, only about 1 percent of all software apps have A.I. features, IDC estimates. By 2018, IDC predicts, at least 50 percent of developers will include A.I. features in what they create.

Source: The Race Is On to Control Artificial Intelligence, and Tech’s Future – The New York Times

The next “tit-for-tat” publicity stunt should most definitely be a battle with robots, exactly like BattleBots, except…

  1. Use A.I. to consume vast amounts of video footage from previous bot battles, while identifying key elements of bot design that gave a bot the ‘upper hand’.  From a human cognition perspective, this exercise may be subjective. The BattleBot scoring process can play a factor in 1) conceiving designs, and 2) defining ‘rules’ of engagement.
  2. Use A.I. to produce BattleBot designs for humans to assemble.
  3. Autonomous battles, bot on bot, based on Artificial Intelligence battle ‘rules’ acquired from the input and analysis of video footage.

Google Glasses for Dynamic Language, and Local Gesture Translation, and Let the Deaf be ‘Heard’

If you have Google Project Glass / Glasses with WiFi to WiFi device connectivity to a smartphone with a pair of head phones, or Glasses, and use Bluetooth, you can have local Language dialect and gesture translation instantly to your ears.  If you are looking at someone and they are articulating in anyway, either by signing, using local gestures, or are speaking in any dialect, an instant, fluent translation program reads and understands the real-time video frames per second (FPS) or Frame Rate, either using 50/60/120 FPS, then applies object recognition to read lips, or human movement, then plays a voice in your own local language dialect in your head phones or Bluetooth.

Travel the world and experience the cultures truly as the locals do, empathize, or use them in the workplace and truly eliminate discrimination against the deaf.

Object recognition may need to be applied to each video frame or sampling of the video stream from the real-time video feed of the Google Glass / Glasses.

Also, managers who employ people who are deaf may apply for a tax deduction or even get the glasses for free with a tax write off for their company.  In part, thank the people that produced Mr. Holland’s Opus. Watched the movie last night.  Great movie! In part, thanks to my mom who is somewhere in the middle east on a cruise

Also, I can see a new wave of popular kids coming up with gestures, and instantly recognized using the Glasses. Sorry kids.  Supposedly, these glasses will ship to Android Google Developers for the cost of $1,500 USD by the end of the year; however, that is just rumor, and if you know Java, an relatively easy programming language to pick up, the Android OS Java extensions are relatively easy as well.  A Small price to pay for a huge market, and maybe even tax deduction for a small business under Research and Development costs.  See your local small business government affairs office for more details.

[dfads params=’groups=1177,1178&limit=1&orderby=random’]

Freelance Streaming Video, Affiliate Advertising Innovation

As described in a previous post, see Streaming Video Freelance: Video Affiliate Network channels like Google Plus / YouTube, Facebook, Twitter and Viveo (everything from bar impromptu jams to concert events), I did not mention all the little bits to avoid issues such as all faces of the audience either need to have realtime face bluring technology similar to Facial recognition, or the people in the event must sign the waver, which might not be likely, so the solution will need a delayed streaming to allow blurring facial recognition software to work, and for some televised events, allow compliance to FCC regulations, edit introducing artificial intelligence (AI) word bleep insertion, and object recognition to blur the recognition of exposed body parts. 🙂 The ridiculously amazing advertising introduction: if a person signs up to allow to be seen, they could be picked if they are wearing a hat, sneakers, or certain brand of shirt, then if the AI object recognition picks up the object, a hue can accentuate the item, the viewer can pause the stream, click on the advertiser’s object on a video streaming tablet touch screen, and get a list of local and web distributors, prioritized by advertising, popularity, and rating.  Same goes for the red carpet events, and although not advertising dresses, suits, and accessories because of cost, it could enhance the event by again, pause, and get a blurb about the designer, the object and a link to the catalog or portfolio of the designer’s work. Introduces advertising revenue for on sight products. Even the videographers smartphone can have an addressable link to the product used to stream, as well as a small logo overlay which if clicked provides a profile of the videographer, and a brief portfolio of their work, all from a tablet application allowing the licensed distributor(s) a main channel in the center of the screen and small square boxes on the outer edge of the main window like PnP, and a person just taps on of the border boxed streams, and then that box becomes the stream that takes up the majority of the screen. The main window can have the absolute maximum fps for the device and the border windows may have half or a quarter of the allowed frames per second (fps), this way you still get to preview the alternate perspective streams, while watching the main stream. You can also auto hide the boarder preview streams to focus the user on the main stream and give them flexibility of alternate vantage points. The system may allow the videographer to bid for boarder alternate allowing placement of their streams in a particular corner, and frequency AS well as the user can have a ‘favorite’ freelancer videographer stream, and the system would allow for quicklinks to videographer streams the user knows will be at the event.

There is more in the predicted trends article for 2013.