Tag Archives: iCloud

Platform Independent AI Model for Images: AI Builder, Easily Utilized by 3rd Party Apps

With all the discourse on OpenAI’s ChatGPT and Natural language processing (NLP), I’d like to steer the conversation toward images/video and object recognition. This is another area in artificial intelligence primed for growth with many use cases. Arguably, it’s not as shocking, bending our society at its core, creating college papers with limited input, but Object Recognition can seem “magical.” AI object recognition may turn art into science, as easy as AI reading your palm to tell your future. AI object recognition will bring consumers more data points from which Augmented Reality (AR) overlays digital images within an analog world of tangible objects.

Microsoft’s AI Builder – Platform Independent

Microsoft’s Power Automate AI [model] Builder has the functionality to get us started on the journey of utilizing images, tagging them with objects we recognize, and then training the AI model to recognize objects in our “production” images. Microsoft provides tools to build AI [image] models (library of images with human, tagged objects) quickly and easily. How you leverage these AI models is the foundation of “future” applications. Some applications are already here, but not mass production. The necessary ingredient: taking away the proprietary building of AI models, such as in social media applications.

In many social media applications, users can tag faces in their images for various reasons, mostly who to share their content/images with. In most cases, images can also be tagged with a specific location. Each AI image/object model is proprietary and not shared between social media applications. If there was a standards body, an AI model could be created/maintained outside of the social media applications. Portable AI object recognition models with a wide array of applications that support it’s use, such as social media applications. Later on, we’ll discuss Microsoft’s AI Model builder, externalized from any one application, and because it’s Microsoft, it’s intuitive. 🙂

An industry standards body could collaborate and define what AI models look like their features, and most importantly, the portability formats. Then the industry, such as social media apps, can elect to adopt features that are and are not supported by their applications.

Use Cases for Detecting Objects in Images

Why doesn’t everyone have an AI model containing tagged objects within images and videos of the user’s design? Why indeed.

1 – Brands / Product Placement from Content Creators

Just about everyone today is a content creator, producing images and videos for their own personal and business social media feeds, Twitter, Instagram, Snap, Meta, YouTube, and TikTok, to name a few. AI models should be portable enough to integrate with social media applications where tags could be used to identify branded apparel, jewelry, appliances, etc. Tags could also contain metadata, allowing content consumers to follow tagged objects to a specified URL. Clicks and the promotion of products and services.

2 – Object Recognition for Face Detection

Has it all been done? Facebook/Meta, OneDrive, iCloud, and other services have already tried or are implementing some form of object detection in the photos you post. Each of these existing services implements object detection at some level:

  • Identify the faces in your photos, but need you to tag those faces and some “metadata” will be associated with these photos
  • Dynamically grouping/tagging all “Portrait” pictures of a specific individual or events from a specific day and location, like a family vacation.
  • Some image types, JPEGs, PNG, GIF, etc., allow you to add metadata to the files on your own, e.g. so you can search for pictures on the OS level of implementation.
3 – Operational Assistance through object recognition using AR
  • Constructing “complex” components in an assembly line where Augmented Reality (AR) can overlay the next step in assembly with the existing object to help transition the object to the next step in assembly.
  • Assistance putting together IKEA furniture, like the assembly line use case, but for home use.
  • Gaming, everything from Mario Kart Live to Light Saber duels against the infamous Darth Vader.
4 – Palm Reading and other Visual Analytics
  • Predictive weather patterns
5 – Visual Search through Search Engines and Proprietary Applications with Specific Knowledge Base Alignment
  • CoinSnap iPhone App scans both sides of the coin and then goes on to identify the coin, building a user’s collection.
  • Microsoft Bing’s Visual Search and Integration with MSFT Edge
  • Medical Applications, Leveraging AI, e.g., Image Models – Radiology
Radiology – Reading the Tea Leaves

Radiology builds a model of possible issues throughout the body. Creating images with specific types of fractures can empower the autodetection of any issues with the use of AI. If it was a non-proprietary model, radiologists worldwide could contribute to that AI model. The displacement of radiology jobs may inhibit the open non-proprietary nature of the use case, and the AI model may need to be built independently of open input from all radiologists.

Microsoft’s AI Builder – Detect Objects in Images

Microsoft’s AI model builder can help the user build models in minutes. Object Detection, Custom Model, Detect custom objects in images is the “template” you want to use to build a model to detect objects, e.g. people, cars, anything, rather quickly, and can enable users to add images (i.e. train model) to become a better model over time.

Many other AI Model types exist, such as Text Recognition within images. I suggest exploring the Azure AI Models list to fit your needs.

Current, Available Data Sources for Image Input

  • Current Device
  • SharePoint
  • Azure BLOB

Wish List for Data Sources w/Trigger Notifications

When a new image is uploaded into one of these data sources, a “trigger” can be activated to process the image with the AI Model and apply tags to the images.

  • ADT – video cam
  • DropBox
  • Google Drive
  • Instagram
  • Kodak (yeah, still around)
  • Meta/Facebook
  • OneDrive
  • Ring -video cam
  • Shutterfly
  • Twitter

Get Started: Power Automate, Premium Account

Login to Power Automate with your premium account, and select “AI Builder” menu, then the “Models” menu item. The top left part of the screen, select “New AI Model,” From the list of model types, select “Custom Model, Object Detection”Detect Custom Objects in Images.”

AI Builder - Custom Model
AI Builder – Custom Model

It’s a “Premium” feature of Power Automate, so you must have the Premium license. Select “Get Started”,. The first step is to “Select your model’s domain”, there are three choices, so I selected “Common Objects” to give me the broadest opportunity. Then select “Next”.

AI Builder - Custom Model - Domain
AI Builder – Custom Model – Domain

Next, you need to select all of the objects you want to identify in your images. For demonstration purposes, I added my family’s first names as my objects to train my model to identify in images.

AI Builder - Custom Model - Objects for Model
AI Builder – Custom Model – Objects for Model

Next, you need to “Add example images for your objects.” Microsoft’s guidance is “You need to add at least 15 images for each object you want to detect.” Current data sources include:

Add Images
AI Model – Add Images

I added the minimum recommended images, 15 per object, two objects, 30 images of my family, and random pics over the last year.

Once uploaded, you need to go through each image, draw a box around the image’s objects you want to tag, and then select the object tag.

Part 2 – Completing the Model and its App usage.

Beyond Google Search of Personal Data – Proactive, AI Digital Assistant 

As per previous Post, Google Searches Your Personal Data (Calendar, Gmail, Photos), and Produces Consolidated Results, why can’t the Google Assistant take advantage of the same data sources?

Google may attempt to leapfrog their Digital Assistant competition by taking advantage of their ability to search against all Google products.  The more personal data a Digital Assistant may access, the greater the potential for increased value per conversation.

As a first step,  Google’s “Personal”  Search tab in their Search UI has access to Google Calendar, Photos, and your Gmail data.  No doubt other Google products are coming soon.

Big benefits are not just for the consumer to  search through their Personal Goggle data, but provide that consolidated view to the AI Assistant.  Does the Google [Digital] Assistant already have access to Google Keep data, for example.  Is providing Google’s “Personal” search results a dependency to broadening the Digital Assistant’s access and usage?  If so, these…

interactions are most likely based on a reactive model, rather than proactive dialogs, i.e. the Assistant initiating the conversation with the human.

Note: The “Google App” for mobile platforms does:

“What you need, before you ask. Stay a step ahead with Now cards about traffic for your commute, news, birthdays, scores and more.”

I’m not sure how proactive the Google AI is built to provide, but most likely, it’s barely scratching the service of what’s possible.

Modeling Personal, AI + Human Interactions

Starting from N number of accessible data sources, searching for actionable data points, correlating these data points to others, and then escalating to the human as a dynamic or predefined Assistant Consumer Workflow (ACW).  Proactive, AI Digital Assistant initiates human contact to engage in commerce without otherwise being triggered by the consumer.

Actionable data point correlations can trigger multiple goals in parallel.  However, the execution of goal based rules would need to be managed.  The consumer doesn’t want to be bombarded with AI Assistant suggestions, but at the same time, “choice” opportunities may be appropriate, as the Google [mobile] App has implemented ‘Cards’ of bite size data, consumable from the UI, at the user’s discretion.

As an ongoing ‘background’ AI / ML process, Digital Assistant ‘server side’ agent may derive correlations between one or more data source records to get a deeper perspective of the person’s life, and potentially be proactive about providing input to the consumer decision making process.

Bass Fishing Trip
Bass Fishing Trip

For example,

  • The proactive Google Assistant may suggest to book your annual fishing trip soon.  Elevated Interaction to Consumer / User.
  • The Assistant may search Gmail records referring to an annual fishing trip ‘last year’ in August. AI background server side parameter / profile search.   Predefined Assistant Consumer Workflow (ACW) – “Annual Events” Category.  Building workflows that are ‘predefined’ for a core set of goals/rules.
  • AI Assistant may search user’s photo archive on the server side.   Any photo metadata could be garnished from search, including date time stamps, abstracted to include ‘Season’ of Year, and other synonym tags.
  • Photos from around ‘August’ may be earmarked for Assistant use
  • Photos may be geo tagged,  e.g. Lake Champlain, which is known for its fishing.
  •  All objects in the image may be stored as image metadata. Using image object recognition against all photos in the consumer’s repository,  goal / rule execution may occur against pictures from last August, the Assistant may identify the “fishing buddies” posing with a huge “Bass fish”.
  • In addition to the Assistant making the suggestion re: booking the trip, Google’s Assistant may bring up ‘highlighted’ photos from last fishing trip to ‘encourage’ the person to take the trip.

This type of interaction, the Assistant has the ability to proactively ‘coerce’ and influence the human decision making process.  Building these interactive models of communication, and the ‘management’ process to govern the AI Assistant is within reach.

Predefined Assistant Consumer / User Workflows (ACW) may be created by third parties, such as Travel Agencies, or by industry groups, such as foods, “low hanging fruit” easy to implement the “time to get more milk” .  Or, food may not be the best place to start, i.e. Amazon Dash

 

Cloud Storage: Ingestion, Management, and Sharing

Cloud Storage Solutions need differentiation that matters, a tipping point to select one platform over the other.

Common Platforms Used:

Differentiation may come in the form of:

  • Collaborative Content Creation Software, such as DropBox Paper enables individuals or teams to produce content, all the while leveraging the Storage platform for e.g. version control,
  • Embedded integration in a suite of content creation applications, such as Microsoft Office, and OneDrive.
  • Making the storage solution available to developers, such as with AWS S3, and Box.  Developers may create apps powered by the Box Platform or custom integrations with Box
  • iCloud enables users to backup their smartphone, as well tightly integrating with the capture and sharing of content, e.g. Photos.

Cloud Content Lifecycle Categories:

  • Content Creation
    • 3rd Party (e.g. Camera) or Integrated Platform Products
  • Content Ingestion
    • Capture Content and Associated Metadata
  • Content Collaboration
    • Share, Update and Distribution
  • Content Discovery
    • Surface Content; Searching and Drill Down
  • Retention Rules
    • Auto expire pointer to content, or underlying content

Cloud Content Ingestion Services:

Cloud Ingestion Services
Cloud Ingestion Services

Time Lock Access: Seal Files in Cloud Storage

Is there value in providing users the ability to apply “Time Lock Access” to files in cloud storage?  Files are securely uploaded by their Owner.  After upload no one, including the Owner, may access / open the file(s).   Only after the date and time provided for the time lock passes, files will be available for access, and action may be taken, e.g.  Automatically email a link to the files.  More complex actions may be attached to the time lock release such as script execution using a simple set of rules as defined by the file Owner.

Solution already exists?  Please send me a link to the cloud integration product / plug in.