Tag Archives: Instagram

Platform Independent AI Model for Images: AI Builder, Easily Utilized by 3rd Party Apps

With all the discourse on OpenAI’s ChatGPT and Natural language processing (NLP), I’d like to steer the conversation toward images/video and object recognition. This is another area in artificial intelligence primed for growth with many use cases. Arguably, it’s not as shocking, bending our society at its core, creating college papers with limited input, but Object Recognition can seem “magical.” AI object recognition may turn art into science, as easy as AI reading your palm to tell your future. AI object recognition will bring consumers more data points from which Augmented Reality (AR) overlays digital images within an analog world of tangible objects.

Microsoft’s AI Builder – Platform Independent

Microsoft’s Power Automate AI [model] Builder has the functionality to get us started on the journey of utilizing images, tagging them with objects we recognize, and then training the AI model to recognize objects in our “production” images. Microsoft provides tools to build AI [image] models (library of images with human, tagged objects) quickly and easily. How you leverage these AI models is the foundation of “future” applications. Some applications are already here, but not mass production. The necessary ingredient: taking away the proprietary building of AI models, such as in social media applications.

In many social media applications, users can tag faces in their images for various reasons, mostly who to share their content/images with. In most cases, images can also be tagged with a specific location. Each AI image/object model is proprietary and not shared between social media applications. If there was a standards body, an AI model could be created/maintained outside of the social media applications. Portable AI object recognition models with a wide array of applications that support it’s use, such as social media applications. Later on, we’ll discuss Microsoft’s AI Model builder, externalized from any one application, and because it’s Microsoft, it’s intuitive. 🙂

An industry standards body could collaborate and define what AI models look like their features, and most importantly, the portability formats. Then the industry, such as social media apps, can elect to adopt features that are and are not supported by their applications.

Use Cases for Detecting Objects in Images

Why doesn’t everyone have an AI model containing tagged objects within images and videos of the user’s design? Why indeed.

1 – Brands / Product Placement from Content Creators

Just about everyone today is a content creator, producing images and videos for their own personal and business social media feeds, Twitter, Instagram, Snap, Meta, YouTube, and TikTok, to name a few. AI models should be portable enough to integrate with social media applications where tags could be used to identify branded apparel, jewelry, appliances, etc. Tags could also contain metadata, allowing content consumers to follow tagged objects to a specified URL. Clicks and the promotion of products and services.

2 – Object Recognition for Face Detection

Has it all been done? Facebook/Meta, OneDrive, iCloud, and other services have already tried or are implementing some form of object detection in the photos you post. Each of these existing services implements object detection at some level:

  • Identify the faces in your photos, but need you to tag those faces and some “metadata” will be associated with these photos
  • Dynamically grouping/tagging all “Portrait” pictures of a specific individual or events from a specific day and location, like a family vacation.
  • Some image types, JPEGs, PNG, GIF, etc., allow you to add metadata to the files on your own, e.g. so you can search for pictures on the OS level of implementation.
3 – Operational Assistance through object recognition using AR
  • Constructing “complex” components in an assembly line where Augmented Reality (AR) can overlay the next step in assembly with the existing object to help transition the object to the next step in assembly.
  • Assistance putting together IKEA furniture, like the assembly line use case, but for home use.
  • Gaming, everything from Mario Kart Live to Light Saber duels against the infamous Darth Vader.
4 – Palm Reading and other Visual Analytics
  • Predictive weather patterns
5 – Visual Search through Search Engines and Proprietary Applications with Specific Knowledge Base Alignment
  • CoinSnap iPhone App scans both sides of the coin and then goes on to identify the coin, building a user’s collection.
  • Microsoft Bing’s Visual Search and Integration with MSFT Edge
  • Medical Applications, Leveraging AI, e.g., Image Models – Radiology
Radiology – Reading the Tea Leaves

Radiology builds a model of possible issues throughout the body. Creating images with specific types of fractures can empower the autodetection of any issues with the use of AI. If it was a non-proprietary model, radiologists worldwide could contribute to that AI model. The displacement of radiology jobs may inhibit the open non-proprietary nature of the use case, and the AI model may need to be built independently of open input from all radiologists.

Microsoft’s AI Builder – Detect Objects in Images

Microsoft’s AI model builder can help the user build models in minutes. Object Detection, Custom Model, Detect custom objects in images is the “template” you want to use to build a model to detect objects, e.g. people, cars, anything, rather quickly, and can enable users to add images (i.e. train model) to become a better model over time.

Many other AI Model types exist, such as Text Recognition within images. I suggest exploring the Azure AI Models list to fit your needs.

Current, Available Data Sources for Image Input

  • Current Device
  • SharePoint
  • Azure BLOB

Wish List for Data Sources w/Trigger Notifications

When a new image is uploaded into one of these data sources, a “trigger” can be activated to process the image with the AI Model and apply tags to the images.

  • ADT – video cam
  • DropBox
  • Google Drive
  • Instagram
  • Kodak (yeah, still around)
  • Meta/Facebook
  • OneDrive
  • Ring -video cam
  • Shutterfly
  • Twitter

Get Started: Power Automate, Premium Account

Login to Power Automate with your premium account, and select “AI Builder” menu, then the “Models” menu item. The top left part of the screen, select “New AI Model,” From the list of model types, select “Custom Model, Object Detection”Detect Custom Objects in Images.”

AI Builder - Custom Model
AI Builder – Custom Model

It’s a “Premium” feature of Power Automate, so you must have the Premium license. Select “Get Started”,. The first step is to “Select your model’s domain”, there are three choices, so I selected “Common Objects” to give me the broadest opportunity. Then select “Next”.

AI Builder - Custom Model - Domain
AI Builder – Custom Model – Domain

Next, you need to select all of the objects you want to identify in your images. For demonstration purposes, I added my family’s first names as my objects to train my model to identify in images.

AI Builder - Custom Model - Objects for Model
AI Builder – Custom Model – Objects for Model

Next, you need to “Add example images for your objects.” Microsoft’s guidance is “You need to add at least 15 images for each object you want to detect.” Current data sources include:

Add Images
AI Model – Add Images

I added the minimum recommended images, 15 per object, two objects, 30 images of my family, and random pics over the last year.

Once uploaded, you need to go through each image, draw a box around the image’s objects you want to tag, and then select the object tag.

Part 2 – Completing the Model and its App usage.

Microsoft Flow – Platform Review

It looks like Microsoft created a generic workflow platform, product independent.

Microsoft has software solutions, like MS Outlook with an [email] rules engine built into Outlook.  SharePoint has a workflow solution within the Sharepoint Platform, typically governing the content flowing through it’s system.

Microsoft Flow is a different animal.  It seems like Microsoft has built a ‘generic’ rules engine for processing almost any event.  The Flow product:

  1. Start using the product from one of two areas:  a) “My Flows” where I may view existing and create new [work]flows. b) “Activity”, that shows “Notifications” and “Failures”
  2. Select “My Flows”, and the user may “Create [a workflow] from Blank”,  or “Browse Templates”.  MSFT existing set of templates were created by Microsoft, and also by a 3rd party implying a marketplace.
  3. Select “Create from Blank” and the user has a single drop down list of events, a culmination events across Internet products. There is an implication there could be any product, and event “made compatible” with MSFT Flows.
    1. The drop down list of events has a format of “Product – Event”.  As the list of products and events grow, we should see at least two separate drop down lists, one for products, and a sub list for the product specific events.
    2. Several Example Events Include:
      1. “Dropbox – When a file is created”
      2. “Facebook – When there is a new post to my timeline”
      3. “Project Online – When a new task is created”
      4. “RSS – When a feed item is published”
      5. “Salesforce – When an object is created”
    3. The list of products as well as there events may need a business analyst to rationalize the use cases.
  4. Once an Event is selected, event specific details may be required, e.g. Twitter account details, or OneDrive “watch” folder
  5. Next, a Condition may be added to this [work]flow,  and may be specific to the Event type, e.g. OneDrive File Type properties [contains] XYZ value.  There is also an “advanced mode” using a conditional scripting language.
  6. There is “IF YES” and “IF NO” logic, which then allows the user to select one [or more] actions to perform
    1. Several Action Examples Include:
      1. “Excel – Insert Rows”
      2. “FTP – Create File”
      3. “Google Drive – List files in folder”
      4. “Mail – Send email”
      5. “Push Notification – Send a push notification”
    2. Again, it seems like an eclectic bunch of Products, Actions, and Events strung together to have a system to POC.
  7. The Templates list, predefined set of workflows that may be of interest to anyone who does not want to start from scratch.   The UI provides several ways to filter, list, and search through templates.

Applicable to everyday life, from an individual home user, small business, to the enterprise.  At this stage the product seems in Beta at best, or more accurately, just after clickable prototype.  I ran into several errors trying to go through basic use cases, i.e. adding rules.

Despite the “Preview” launch, Microsoft has showed us the power in [work]flow processing regardless of the service platform provider, e.g.  Box, DropBox, Facebook, GitHub, Instagram, Salesforce, Twitter, Google, MailChimp, …

Microsoft may be the glue to combine service providers who may / expose their services to MSFT Flow functionality.

Create from Blank - Select Condition
Create from Blank – Select Condition

 

Create Rule from Template
Create Rule from Template

Create from Blank Rule Building UI
Create from Blank Rule Building UI

 

Update June 28th, 2016:

Opportunities for Event, Condition, Action Rules

  • Transcoding [cloud] Services
  • [IBM Watson] Cognitive APIs
    • e.g. Language:Translation; E.g.2. Visual Recognition;
  • WordPress – Create a Post
    • New text file dropped in specific folder on Box, DropBox, etc. being ‘monitored’ by MSFT flow [?] Additional code required by user for ‘polling’ capabilities
    • OR new text file attached, and emailed to specific email account folder ‘watched’ by MSFT Flow.
    • Event triggers – Automatic read of new text file
      • stylizing may occur if HTML coding used
    • Action – Post to a Blog
  • ‘ANY’ Event occurs, a custom message is sent using Skype for a single or group of Skype accounts;
    • On several ‘eligible’ events, such as “File Creation” into Box,  the file (or file shared URL) may be sent to the Skype account.
  • ‘ANY’ Event occurs, a custom mobile text message is sent to a single or group of phone numbers.
  • Event occurs for “File Creation” e.g. into Box; after passing a “Condition”, actions occur:
    • IBM Watson Cognitive API, Text to Speech, occurs, and the product of the action is placed in the same Box folder.
  • Action: Using Microsoft Edge (powered by MSN), in the “My news feed” tab, enable action to publish “Cards”, such as app notifications

Challenges \ Opportunities \ Unknowns

  • 3rd party companies existing, published [cloud; web service] APIs may not even need any modification to integrate with Microsoft Flow; however, business approval may be required to use the API in this manner,
  • It is unclear re: Flow Templates need to be created by the product owner, e.g. Telestream, or knowledgeable third party, following the Android, iOS, and/or MSFT Mobile Apps model.
  • It is unclear if the MSFT Flow app may be licensed individually in the cloud, within the 365 cloud suite, or offered for Home and\or Business?

Our Mind’s Eye Stream for Sale: Who will Own those Portals?

As we approach a brave new world of our Mind’s Eye for Sale, who will own that portal, or jump page to other view’s or other perspectives?  It gets more and more expensive to see the world, and harder to travel.  Sound like Total Recall, the movie, not far off from the path we’re already on without even realizing.  Portals to other people’s perspectives, such as Instagram, seeing life from other people’s interpretations of the world, it is fascinating and alluring to us.

Once the Genie is out of the bottle, it’s hard to turn back.  In all sincerity, a lightweight version of Google’s Android OS for Glass may even be downloadable, and free as it is based on open source.  The Glass is super stylish, but super expensive.  If you’re in the main stream you can afford the glass, if not, you can build your own.  Not that difficult relatively speaking, a kit from Texas Instruments perhaps, such as we’ve seen in the PC world, where they now offer small computer kits for building small computers with Android, and Linux.  If you wanted to build your own Google Glass, how fast will there be imitations, I imagine, faster than you can blink an eye, pun intended.

Google will make it popular and sexy, after that, there could be a flood of imitations.  After all, today we can all build knock off Google Glass, a tiny web cam, a lightweight OS, and Bluetooth integrated with your smartphone for two way interaction, streaming, and communications.  The lightweight OS could today be Linux, but the champion for this effort , Red Hat? No, they are a support and solutions group for a blend of Unix.   No, there are a few hurdles that Google must and have taken, in some cases, partnered with Verizon, who had their own blend of HUD at the 2013 CES conference.  Today, we might mock and jeer people who wore glasses with a mini cam on their glasses.  It might be clunky, the idea is to make it alluring to the masses, as well as going through iterations to make it an acceptable medium to the public.  Once Google, the trailblazer in this endeavor burns through the problems, it will pave the way for a massive wave of alternate choices, become a commodity.  It’s not just the issue with the UI, there are legal battles to be fought, privacy for example, is it safe to drive with them on, and so on.  There needs to be mainstream platforms, so people take advantage, and are lured to independent platforms.  Many other companies might follow, such as Amazon or other cloud based companies.  Maybe even independent sites, web sites, mobile apps, and others joining and integrating with APIs.

Facebook Trying to Kill Off Instagram?

So, the obvious thing here is Facebook wants to kill Off Instagram.  You think core users, won’t care, possibly?  Is it even legal, their new terms of service, questionable, with adding the bit regarding minors being included.  So why buy Instagram, and put outrageous terms to the very popular service.  One reason might be a common tale, where a suitor company will buy what it projects has high market share they are already, or plan on getting into, to allow them to grab mass market share.  The suitor company may already be in the market, and simply can capitalize on their resources, e.g. staff, technology, and then try to run the ship aground, i.e. sabotage. demoting the acquired company by putting a poor taste in the customers path, and the original suitor company offers an alternate path, which attracts the customer base to convert.  Some of the articles in the New York Times, What Instagram’s New Terms of Service Mean for You and an a Mashable OP-ED piece, Instagram Will Basically Sign Your Life Away imply picking up your pitchforks and rally us around Instagram, and apply a crowd mentality to trample yourself away from Instagram.

If this is the Facebook / Instagram business model, as these folk are interpreting the requirements, I am not so personally keen on my daughter using Instagram, and her showing up in an advertisement, as I think I read this bit from the interpreted TOS.  I don’t think the kid would be too keen either, probably for a different reason then her Father.  Advertisements can be taken out of context, or you may loose control of how your face is integrated with a product or service, and might not necessarily agree with its use.  Talk about your type-casting.  A teen shows up in an advertisement for acne, she doesn’t know about the advertisement until it’s posed on her locker, and this is a relatively innocent example..  In addition, a capitalistic kid would say, “I am not particularly keen on my face, or pictures showing up somewhere without my permission, but hey, where is my cut.”

There are already established platforms that sell photographer’s photos through established licencing models, and sure, that may be another more viable model for the Facebook / Instagram folks, but hey, I am just a man with a keyboard, and half a brain.

– Zaphod Beeblebrox

Google Plus…Beta in Progress

Last night I was driving my daughter and her guy friend home from an after school program, and I really like this kid, he’s a little tech geek like me.  I spoke to this computer savvy twelve year old kid, and was asking him all these questions from a kid’s perspective about all these new technologies.  He was articulating and rattling off thought provoking and meaningful information.  It was like I was talking to an industry analyst, bright fun, good kid.  He and I talked non-stop, and after we dropped him off, and I realized we monopolized the conversation, and my daughter might have wanted to get in a word, so I apologized.  Forgot what it was like to be teen, a girl no less.

Anyway, in the mist of our discussion the kid said he uses a gmail account instead of his default ISP.  I asked him about what he thought of Google Plus.  He said he did some exploring of it. “Yeah, Google + was created to compete with Facebook, but it’s not really that great.”   I asked him if he knew about a few features I thought were cool, and his response was “No, not really, Instagram,” he said “was ‘killin it’ though.” We went onto more market analysis of the space.  I was amazed.  Kids.

It was then I realized why the kid didn’t get past the first page.  Appeal and usability.  These are concepts in User Interface design and are essential in attracting users. These types of features are usually added later on.  The standard technology mantra, is “Make it work, then make it work [faster, refine UI, etc].  If I was trying to really be unbiased, Google Plus is tantamount to a Beta product.  As an example, the “Your Circle” buttons truncate the Circle name, are square shaped, and don’t have an appeal.  In fact, many of the user interface features feel canned.  The user interface is not the focus initially when putting out a product, especially when you are in a rapid mode of delivering, and are certain your product may change drastically, i.e. based on your road map, user feedback, and so on.  Although I really like the baseline platform, and am trying not to be, I am a bit biased in favor of Google.  Google Plus looks like they are using the Agile methodology with Scrum Sprints and constant releases. To be clear, I am using their own, Google’s latest browser Chrome, on Windows 7 with a powerful computer.

So, what does this teach us?  Well, in Project Management, sometimes you can add all the resources in the world to a project, but at some point you get diminishing returns, and there is a limit to delivery capability even using Agile and Scrum methodologies, especially if Social Networking is high on Google’s agenda.  Agile requires user feedback, hence the release, and user response cyclical feedback loop.