Tag Archives: Artificial Intelligence

Platform Independent AI Model for Images: AI Builder, Easily Utilized by 3rd Party Apps

With all the discourse on OpenAI’s ChatGPT and Natural language processing (NLP), I’d like to steer the conversation toward images/video and object recognition. This is another area in artificial intelligence primed for growth with many use cases. Arguably, it’s not as shocking, bending our society at its core, creating college papers with limited input, but Object Recognition can seem “magical.” AI object recognition may turn art into science, as easy as AI reading your palm to tell your future. AI object recognition will bring consumers more data points from which Augmented Reality (AR) overlays digital images within an analog world of tangible objects.

Microsoft’s AI Builder – Platform Independent

Microsoft’s Power Automate AI [model] Builder has the functionality to get us started on the journey of utilizing images, tagging them with objects we recognize, and then training the AI model to recognize objects in our “production” images. Microsoft provides tools to build AI [image] models (library of images with human, tagged objects) quickly and easily. How you leverage these AI models is the foundation of “future” applications. Some applications are already here, but not mass production. The necessary ingredient: taking away the proprietary building of AI models, such as in social media applications.

In many social media applications, users can tag faces in their images for various reasons, mostly who to share their content/images with. In most cases, images can also be tagged with a specific location. Each AI image/object model is proprietary and not shared between social media applications. If there was a standards body, an AI model could be created/maintained outside of the social media applications. Portable AI object recognition models with a wide array of applications that support it’s use, such as social media applications. Later on, we’ll discuss Microsoft’s AI Model builder, externalized from any one application, and because it’s Microsoft, it’s intuitive. 🙂

An industry standards body could collaborate and define what AI models look like their features, and most importantly, the portability formats. Then the industry, such as social media apps, can elect to adopt features that are and are not supported by their applications.

Use Cases for Detecting Objects in Images

Why doesn’t everyone have an AI model containing tagged objects within images and videos of the user’s design? Why indeed.

1 – Brands / Product Placement from Content Creators

Just about everyone today is a content creator, producing images and videos for their own personal and business social media feeds, Twitter, Instagram, Snap, Meta, YouTube, and TikTok, to name a few. AI models should be portable enough to integrate with social media applications where tags could be used to identify branded apparel, jewelry, appliances, etc. Tags could also contain metadata, allowing content consumers to follow tagged objects to a specified URL. Clicks and the promotion of products and services.

2 – Object Recognition for Face Detection

Has it all been done? Facebook/Meta, OneDrive, iCloud, and other services have already tried or are implementing some form of object detection in the photos you post. Each of these existing services implements object detection at some level:

  • Identify the faces in your photos, but need you to tag those faces and some “metadata” will be associated with these photos
  • Dynamically grouping/tagging all “Portrait” pictures of a specific individual or events from a specific day and location, like a family vacation.
  • Some image types, JPEGs, PNG, GIF, etc., allow you to add metadata to the files on your own, e.g. so you can search for pictures on the OS level of implementation.
3 – Operational Assistance through object recognition using AR
  • Constructing “complex” components in an assembly line where Augmented Reality (AR) can overlay the next step in assembly with the existing object to help transition the object to the next step in assembly.
  • Assistance putting together IKEA furniture, like the assembly line use case, but for home use.
  • Gaming, everything from Mario Kart Live to Light Saber duels against the infamous Darth Vader.
4 – Palm Reading and other Visual Analytics
  • Predictive weather patterns
5 – Visual Search through Search Engines and Proprietary Applications with Specific Knowledge Base Alignment
  • CoinSnap iPhone App scans both sides of the coin and then goes on to identify the coin, building a user’s collection.
  • Microsoft Bing’s Visual Search and Integration with MSFT Edge
  • Medical Applications, Leveraging AI, e.g., Image Models – Radiology
Radiology – Reading the Tea Leaves

Radiology builds a model of possible issues throughout the body. Creating images with specific types of fractures can empower the autodetection of any issues with the use of AI. If it was a non-proprietary model, radiologists worldwide could contribute to that AI model. The displacement of radiology jobs may inhibit the open non-proprietary nature of the use case, and the AI model may need to be built independently of open input from all radiologists.

Microsoft’s AI Builder – Detect Objects in Images

Microsoft’s AI model builder can help the user build models in minutes. Object Detection, Custom Model, Detect custom objects in images is the “template” you want to use to build a model to detect objects, e.g. people, cars, anything, rather quickly, and can enable users to add images (i.e. train model) to become a better model over time.

Many other AI Model types exist, such as Text Recognition within images. I suggest exploring the Azure AI Models list to fit your needs.

Current, Available Data Sources for Image Input

  • Current Device
  • SharePoint
  • Azure BLOB

Wish List for Data Sources w/Trigger Notifications

When a new image is uploaded into one of these data sources, a “trigger” can be activated to process the image with the AI Model and apply tags to the images.

  • ADT – video cam
  • DropBox
  • Google Drive
  • Instagram
  • Kodak (yeah, still around)
  • Meta/Facebook
  • OneDrive
  • Ring -video cam
  • Shutterfly
  • Twitter

Get Started: Power Automate, Premium Account

Login to Power Automate with your premium account, and select “AI Builder” menu, then the “Models” menu item. The top left part of the screen, select “New AI Model,” From the list of model types, select “Custom Model, Object Detection”Detect Custom Objects in Images.”

AI Builder - Custom Model
AI Builder – Custom Model

It’s a “Premium” feature of Power Automate, so you must have the Premium license. Select “Get Started”,. The first step is to “Select your model’s domain”, there are three choices, so I selected “Common Objects” to give me the broadest opportunity. Then select “Next”.

AI Builder - Custom Model - Domain
AI Builder – Custom Model – Domain

Next, you need to select all of the objects you want to identify in your images. For demonstration purposes, I added my family’s first names as my objects to train my model to identify in images.

AI Builder - Custom Model - Objects for Model
AI Builder – Custom Model – Objects for Model

Next, you need to “Add example images for your objects.” Microsoft’s guidance is “You need to add at least 15 images for each object you want to detect.” Current data sources include:

Add Images
AI Model – Add Images

I added the minimum recommended images, 15 per object, two objects, 30 images of my family, and random pics over the last year.

Once uploaded, you need to go through each image, draw a box around the image’s objects you want to tag, and then select the object tag.

Part 2 – Completing the Model and its App usage.

Who’s at the Front Door…Again?

Busy Time of Year, Happy Holidays

The holiday season brings lots of people to your front door. If you have a front door camera, you may be getting many alerts from your front door that let you know there is motion at the door. It would be great if the front doorbell cameras could take the next step and incorporate #AI facial/image recognition and notify you through #iOS notifications WHO is at the front door and, in some cases, which “uniformed” person is at the door, e.g. FedEx/UPS delivery person.

RIng iOS Notification
RIng iOS Notification

This facial recognition technology is already baked into Microsoft #OneDrive Photos and Apple #iCloud Photos. It wouldn’t be a huge leap to apply facial and object recognition to catalog the people who come to your front door as well as image recognition for uniforms that they are wearing, e.g., UPS delivery person.

iCloud/OneDrive Photos identify faces in your images, group by likeness, so the owner of the photo gallery can identify this group of faces as Grandma, for example. It may take one extra step for the camera owner to login into the image/video storage service and classify a group of videos converted to stills containing the face of Grandma. Facebook Meta also can tag the faces within pictures you upload and share. The Facebook app also can “guess” faces based on previously uploaded images.

No need to launch the Ring app and see who’s at the front door. Facial recognition can remove the step required to find out what is the motion at the front door and just post the iOS notification with the “who’s there”.

One less step to launching the Ring app and see who is at the front door.

Microsoft’s Plethora of Portals

As I was looking through Microsoft’s catalog of applications, it occurred to me just how many of their platforms are information-centric and seemed to overlap in functionality. Where should I go when I want to get stuff done, find information or produce it? Since the early days of AOL and AltaVista, we’ve seen the awesome power of a “Jump Page” as the starting point for our information journey.

Microsoft, which one do I choose?

From one software vendor’s perspective, we’ve got many options. What’s the best option for me? Seems like there should be opportunities to gain synergies between available Microsoft platforms.

Bing.com

Searching for information on the internet? News, images, encyclopedias, Wikipedia, whatever you need, and more is on the web. Microsoft Bing helps you find what you need regardless if you’re using text or an image to search for like for like information. It also serves up “relevant” information on the jump page, news mixed with advertisements. There is also a feature enabling you to add carousel “boxes”. for example, containing latest MS Word files used, synergy from Office.com

Office.com

Word, PowerPoint, Excel, Visio, Power BI… If you’ve created content or want to create content using Microsoft applications, Office.com is the one-stop-shop for all your Office apps and the content created using these applications.

SharePoint

Another portal to a universe of information around a centric theme, such as collaboration/interaction with product/project team members, an Intranet, SharePoint site with one or multiple teams. At the most fundamental level is the capability to collaborate/interact with teams, potentially leveraging Microsoft collaboration tools. Just one of many of its capabilities “out of the box” is a document management solution and the use of version control.

SharePoint can also be used for any type of Internet/web platform, i.e., a public-facing portal platform. However, SharePoint, in fact, is a sharing tool in which the authors of the website can share video presentations, shared calendars of public events, and a plethora of customized lists.

Yammer

Engaging your people is more critical than ever. Yammer connects leaders, communicators, and employees to build communities, share knowledge, and engage everyone. I’m thinking synonymous with a bulletin board. The implementation of Yammer looks like Facebook for the Enterprise.

  • Use the Home feed to stay on top of what matters, tap into the knowledge of others, and build on existing work.
  • Search for experts, conversations, and files.
  • Join communities to stay informed, connect with your coworkers, and gather ideas.
  • Join in the conversation, react, reply to, and share posts.
  • @ mention someone to loop them in.
  • Attach a file, gif, photo, or video to enhance your post.
  • Praise someone in your network to celebrate a success, or just to say thanks.
  • Create a virtual event that your community can ask a question and participate live or watch the recording afterwards.
  • Use polls to crowd source feedback and get answers fast.
  • Stay connected outside the office with the Yammer mobile app.
  • Use Yammer in Microsoft Teams, SharePoint, or Outlook.

“Yammer helps you connect and engage across your organization so that you can discuss ideas, share updates, and network with others.”

Microsoft Teams

For any team, there is a wealth of information varying from the group or single Chats, Teams, Calls, Files, and practically integration for almost all Microsoft applications and beyond. The extensibility of MS Teams seems relatively boundless, such as integrations with Wikis, SharePoint document folders, etc. From what I can tell, many organizations just use Teams for the group, or individual Chat channels are barely grazing the surface of MS Teams’ capabilities.

Setup of MS Teams, Teams “landing” page is a great place to start constructing your “living space” within MS Teams. From there, you can carve out space for all things related to the team. For example, in the “Team ABC” Team channel, you can add N number of “tabs” relating to everything from an embedded Wiki to specific SharePoint folders for the team’s product specifications. A team could even create an embedded Azure DevOps [Kanban] Board to show progress and essentially “live in” your MS Team, team channel.

Another porta;l overlap, Microsoft Teams Communities, seems to equate to Yammer.

Delve

What is Delve – Microsoft 365?

Use Delve to manage your Microsoft 365 profile and to discover and organize the information that’s likely to be most interesting to you right now – across Microsoft 365.

Delve never changes any permissions, so you’ll only see documents that you already have access to. Other people will not see your private documents. Learn more about privacy.

Delve is a content curation platform for the person it’s most relevant to…you. It gives the appearance of a user experience similar to carousels of video streaming apps. There are “Popular Documents” carousels and other carousels that are based on the most recent access. Based on how files are saved based on who can access content is how the platform gives you a treasure trove of documents you never knew you had access to or existed. It actually paints a potential compliance nightmare if people select the default document access as “…anyone within my organization…”.

Outlook.com / Best of MSN

Another portal of information focused around you: your email, your calendar, your To-Dos, and your contacts/people. It’s not just your communication with anyone, e.g., your project team members; it’s organizing your life on a smaller scale, e.g., To-Dos. You can also access other shared calendars, such as a team release schedule or a PTO schedule.

The Best of MSN is information, i.e., news around your interests, a digest of information relevant to you, delivered in an email format. Other digests of information from other sources may be curated and sent if subscribed.

Mediums to Traverse Information: AR, VR…

The visual paradigms used to query and access information may drastically influence the user’s capacity to digest the relevant information. For example, in an Augmented Reality (AR) experience, querying, identifying information, and then applying it, serving up the content in a way most conducive to a user’s experience is vital.

Users can’t just “Google It” and serve up the results like magic. The next evolution of querying information and serving up content in a medium to maximize its usability is key and is most evident when using Augmented Reality (AR). If you’re building something, instructions may be overlayed by the physical elements/parts in front of the user. Even the context of the step number would allow the virtual images to overlay the parts.

Automated and Manual Content Curation is a MUST for all Portals

Categories, Tags, Images, and all other associations from object A to everything else, the Meta of Existence, are essential for proper information dissemination and digestion. If you can tag any object with metadata, you can teach an AI/search engine to identify it in a relevant query. Implementing an Induction Engine, a type of Artificial Intelligence that proposes rules based on historic patterns is a must to improve query accuracy over time.

Next level, “Information applications” – Improved Living with Alzheimer’s

Next Ecosystem: Google..?

AR Sudoku Solver Uses Machine Learning To Solve Puzzles Instantly

Very novel concept, applying Augmented Reality and Artificial Intelligence (i.e. Machine Learning) to solving puzzles, such as Sudoko.  Maybe not so novel considering AR uses in manufacturing.

Next, we’ll be using similar technology for human to human negotiations, reading body language, understanding logical arguments, reading human emotion, and to rebut remarks in a debate.

Litigators watch out… Or, co-counsel?   Maybe a hand of Poker?

Source: AR Sudoku Solver Uses Machine Learning To Solve Puzzles Instantly

Follow the Breadcrumbs: Identify and Transform

Trends – High Occurrence, Word Associations

Over the last two decades, I’ve been involved in several solutions that incorporated artificial intelligence and in some cases machine learning. I’ve understood at the architectural level, and in some cases, a deeper dive.

I’ve had the urge to perform a data trending exercise, where not only do we identify existing trends, similar to “out of the box” Twitter capabilities, we can also augment “the message” as trends unfold. Also, probably AI 101. However, I wanted to submerge myself in understanding this Data Science project. My Solution Statement: Given a list of my interests, we can derive sentence fragments from Twitter, traverse the tweet, parsing each word off as a possible “breadcrumb”. Then remove the Stop Words, and voila, words that can identify trends, and can be used to create/modify trends.

Finally, to give the breadcrumbs, and those “words of interest” greater depth, using the Oxford Dictionaries API we can enrich the data with things like their Thesaurus and Synonyms.

Gotta Have a Hobby

It’s been a while now that I’ve been hooked on Microsoft Power Automate, formerly known as Microsoft Flow. It’s relatively inexpensive and has the capabilities to be a tremendous resource for almost ANY project. There is a FREE version, and then the paid version is $15 per month. No brainer to pick the $15 tier with bonus data connectors.

I’ve had the opportunity to explore the platform and create workflows. Some fun examples, initially, using MS Flow, I parsed RSS feeds, and if a criterion was met, I’d get an email. I did the same with a Twitter feed. I then kicked it up a notch and inserted these records of interest into a database. The library of Templates and Connectors is staggering, and I suggest you take a look if you’re in a position where you need to collect and transform data, followed by a Load and a notification process.

What Problem are we Trying to Solve?

How are trends formed, how are they influenced, and what factors influence them? The most influential people providing input to a trend? Influential based on location? Does language play a factor on how trends are developed? End Goal: driving trends, and not just observing them.

Witches Brew – Experiment Ingredients:

Obtaining and Scrubbing Data

Articles I’ve read regarding Data Science projects revolved around 5 steps:

  1. Obtain Data
  2. Scrub Data
  3. Explore Data
  4. Model Data
  5. Interpreting Data

The rest of this post will mostly revolve around steps 1 and 2. Here is a great article that goes through each of the steps in more detail: 5 Steps of a Data Science Project Lifecycle

Capturing and Preparing the Data

The data set is arguably the most important aspect of Machine Learning. Not having a set of data that conforms to the bell curve and consists of all outliers will produce an inaccurate reflection of the present, and poor prediction of the future.

First, I created a table of search criteria based on topics that interest me.

Search Criteria List

Then I created a Microsoft Flow for each of the search criteria to capture tweets with the search text, and insert the results into a database table.

MS Flow - Twitter : Ingestion of Learning Tweets
MS Flow – Twitter: Ingestion of Learning Tweets

Out of the total 7450 tweets collected from all the search criteria, 548 tweets were from the Search Criteria “Learning” (22).

Data Ingestion - Twitter
Data Ingestion – Twitter

After you’ve obtained the data, you will need to parse the Tweet text into “breadcrumbs”, which “lead a path” to the Search Criteria.

Machine Learning and Structured Query Language (SQL)

This entire predictive trend analysis could be much easier with a more restrictive syntax language like SQL instead of English Tweets. Parsing SQL statements would be easier to make correlations. For example, the SQL structure can be represented such as: SELECT Col1, Col2 FROM TableA where Col2 = ‘ABC’. Based on the data set size, we may be able to extrapolate and correlate rows returned to provide valuable insights, e.g. projected impact performance of the query to the data warehouse.

R language and R Studio

Preparing Data Sets Using Tools Designed to Perform Data Science.

R language and R Studio seems to be very powerful when dealing with large data sets, and syntax makes it easy to “clean” the data set. However, I still prefer SQL Server and a decent query tool. Maybe my opinion will change over time. The most helpful thing I’ve seen from R studio is to create new data frames and the ability to rollback to a point in time, i.e. the previous version of the data set.

Changing column data type on the fly in R studio is also immensely valuable. For example, the data in the column are integers but the data table/column definition is a string or varchar. The user would have to drop the table in SQL DB, recreate the table with the new data type, and then reload the data. Not so with R.

Email Composer: Persona Point of View (POV) Reviews

First, there was Spell Check, next Thesaurus, Synonyms, contextual grammar suggestions, and now Persona, Point of View Reviews. Between the immensely accurate and omnipresent #Grammarly and #Google’s #Gmail Predictive Text, I starting thinking about the next step in the AI and Human partnership on crafting communications.

Google Gmail Predictive Text

Google gMail predictive text had me thinking about AI possibilities within an email, and it occurred to me, I understand what I’m trying to communicate to my email recipients but do I really know how my message is being interpreted?

Google gMail has this eerily accurate auto suggestive capability, as you type out your email sentence gMail suggests the next word or words that you plan on typing. As you type auto suggestive sentence fragments appear to the right of the cursor. It’s like reading your mind. The most common word or words that are predicted to come next in the composer’s eMail.

Personas

In the software development world, it’s a categorization or grouping of people that may play a similar role, behave in a consistent fashion. For example, we may have a lifecycle of parking meters, where the primary goal is the collection of parking fees. In this case, personas may include “meter attendant”, and “the consumer”. These two personas have different goals, and how they behave can be categorized. There are many such roles within and outside a business context.

In many software development tools that enable people to collect and track user stories or requirements, the tools also allow you to define and correlate personas with user stories.

As in the case of email composition, once the email has been written, the composer may choose to select a category of people they would like to “view from their perspective”. Can the email application define categories of recipients, and then preview these emails from their perspective viewpoints?

What will the selected persona derive from the words arranged in a particular order? What meaning will they attribute to the email?

Use Personas in the formulation of user stories/requirements; understand how Personas will react to “the system”, and changes to the system.

Finally the use of the [email composer] solution based on “actors” or “personas”. What personas’ are “out of the box”? What personas will need to be derived by the email composer’s setup of these categories of people? Wizard-based Persona definitions?

There are already software development tools like Azure DevOps (ADO), which empower teams to manage product backlogs and correlate “User Stories”, or “Product Backlog Items” with Personas. These are static personas, that are completely user-defined, and no intelligence to correlate “user stories” with personas”. Users of ADO must create these links.

Now, technology can assist us to consider the intended audience, a systematic, biased perspective using Artificial Intelligence to inspect your email based on selected “point of view” (a Persons) of the intended email. Maybe your email will be misconstrued as abrasive, and not the intended response.

Deep Learning vs Machine Learning – Overview & Differences – Morioh

Machine learning and deep learning are two subsets of artificial intelligence which have garnered a lot of attention over the past two years. If you’re here looking to understand both the terms in the simplest way possible, there’s no better place to be..
— Read on morioh.com/p/78e1357f65b0

Help Wanted: Civil War Reenactment Soldiers to Improve AI Models

I just read an article on Digital PC Magazine, “Human Help Wanted: Why AI Is Terrible at Content Moderation” which started to get my neurons firing.

Problem Statement

Every day, Facebook’s artificial intelligence algorithms tackle the enormous task of finding and removing millions of posts containing spam, hate speech, nudity, violence, and terrorist propaganda. And though the company has access to some of the world’s most coveted talent and technology, it’s struggling to find and remove toxic content fast enough.

Ben Dickson
July 10, 2019 1:36PM EST

I’ve worked at several software companies which leveraged Artifical Intelligence, Machine Learning to recognize patterns, correlations. The larger the data sets, in general, the higher the accuracy of the predictions. The outliers in the data, the noise, “falls out” of the data set. Without quality, large training data, Artificial Intelligence makes more mistakes.

In terms of speech recognition, image classification, and natural language processing (NLP), in general, programs like chatbots, digital assistants, are becoming more accurate because of their sample size, training data sets are large, and there is no shortage of these data types. For example, there are many ways I can ask my digital assistant for something, like “Get the movie times”. Training a digital assistant, at a high level, would be to catalog how many ways can I ask for “something”, achieve my goal. I can go and create that list. I could write a few dozen questions, but still, my sample data set would be too small. Amazon has a crowdsourcing platform, Amazon Mechanical Turk, which I can request they build me the data sets, thousands of questions, and correlated goals.

MTurk enables companies to harness the collective intelligence, skills, and insights from a global workforce to streamline business processes, augment data collection and analysis, and accelerate machine learning development.

Amazon Mechanical Turk: Access a global, on-demand, 24×7 workforce

Video “Scene” Recognition – Annotated Data Sets for a Wide Variety of Scene Themes

In silent films, the plot was conveyed by the use of title cards, written indications of the plot and key dialogue lines. Unfortunately, silent films are not making a comeback. In order to achieve a high rate of successful identification of activities within a given video clip, video libraries of metadata need to be created, that capture:

  • Media / Video Asset, Unique Identifier
  • Scene Clip IN and OUT timecodes
  • Scene Theme(s), similar to Natural language processing (NLP), Goals = Utterances / Sentences
    • E.g. Man drinking water; Woman playing Tennis
  • Image recognition, in the context of machine vision, is the ability of software to identify objects, places, people, writing and actions in images. Image recognition is used to perform a large number of machine-based visual tasks, such as labeling the content of images with meta-tags

Not Enough Data

Here is an example of how Social Media, such as Facebook, attempts to deal with video deemed inappropriate for their platform:

In March, a shooter in New Zealand live-streamed the brutal killing of 51 people in two mosques on Facebook. But the social-media giant’s algorithms failed to detect the gruesome video. It took Facebook an hour to take the video down, and even then, the company was hard-pressed to deal with users who reposted the video.

Ben Dickson
July 10, 2019 1:36PM EST

…in many cases, such as violent content, there aren’t enough examples to train a reliable AI model. “Thankfully, we don’t have a lot of examples of real people shooting other people,” Yann LeCun, Facebook’s chief artificial-intelligence scientist, told Bloomberg.

Ben Dickson
July 10, 2019 1:36PM EST

Opportunities for Actors and Curators of Video Content: Dramatizations

All those thousands of people who perform, creating videos of content that range the gamut from playing video games to “unboxing” collectible items. The actors who perform dramatizations could add tags to their videos indicating as per above, documenting themes for a given skit. If actors post their videos on YouTube or proprietary crowdsourcing platforms, they would be entitled to some revenue for the use of their licensed video.

Disclosure Regarding Flag Controversy

I now realize there are politics around Nike “tipping their hat” toward the Betsy Ross flag. However, when I referenced the flag in this blog post, I was thinking of the American Revolution, and the 13 colonies flag. I didn’t think the title would resonate with readers, “Help Wanted: Amerian Revolutionary war Reenactment Soldiers to Improve AI Models.”, so I took some creative liberty.