Tag Archives: YouTube

Platform Independent AI Model for Images: AI Builder, Easily Utilized by 3rd Party Apps

With all the discourse on OpenAI’s ChatGPT and Natural language processing (NLP), I’d like to steer the conversation toward images/video and object recognition. This is another area in artificial intelligence primed for growth with many use cases. Arguably, it’s not as shocking, bending our society at its core, creating college papers with limited input, but Object Recognition can seem “magical.” AI object recognition may turn art into science, as easy as AI reading your palm to tell your future. AI object recognition will bring consumers more data points from which Augmented Reality (AR) overlays digital images within an analog world of tangible objects.

Microsoft’s AI Builder – Platform Independent

Microsoft’s Power Automate AI [model] Builder has the functionality to get us started on the journey of utilizing images, tagging them with objects we recognize, and then training the AI model to recognize objects in our “production” images. Microsoft provides tools to build AI [image] models (library of images with human, tagged objects) quickly and easily. How you leverage these AI models is the foundation of “future” applications. Some applications are already here, but not mass production. The necessary ingredient: taking away the proprietary building of AI models, such as in social media applications.

In many social media applications, users can tag faces in their images for various reasons, mostly who to share their content/images with. In most cases, images can also be tagged with a specific location. Each AI image/object model is proprietary and not shared between social media applications. If there was a standards body, an AI model could be created/maintained outside of the social media applications. Portable AI object recognition models with a wide array of applications that support it’s use, such as social media applications. Later on, we’ll discuss Microsoft’s AI Model builder, externalized from any one application, and because it’s Microsoft, it’s intuitive. 🙂

An industry standards body could collaborate and define what AI models look like their features, and most importantly, the portability formats. Then the industry, such as social media apps, can elect to adopt features that are and are not supported by their applications.

Use Cases for Detecting Objects in Images

Why doesn’t everyone have an AI model containing tagged objects within images and videos of the user’s design? Why indeed.

1 – Brands / Product Placement from Content Creators

Just about everyone today is a content creator, producing images and videos for their own personal and business social media feeds, Twitter, Instagram, Snap, Meta, YouTube, and TikTok, to name a few. AI models should be portable enough to integrate with social media applications where tags could be used to identify branded apparel, jewelry, appliances, etc. Tags could also contain metadata, allowing content consumers to follow tagged objects to a specified URL. Clicks and the promotion of products and services.

2 – Object Recognition for Face Detection

Has it all been done? Facebook/Meta, OneDrive, iCloud, and other services have already tried or are implementing some form of object detection in the photos you post. Each of these existing services implements object detection at some level:

  • Identify the faces in your photos, but need you to tag those faces and some “metadata” will be associated with these photos
  • Dynamically grouping/tagging all “Portrait” pictures of a specific individual or events from a specific day and location, like a family vacation.
  • Some image types, JPEGs, PNG, GIF, etc., allow you to add metadata to the files on your own, e.g. so you can search for pictures on the OS level of implementation.
3 – Operational Assistance through object recognition using AR
  • Constructing “complex” components in an assembly line where Augmented Reality (AR) can overlay the next step in assembly with the existing object to help transition the object to the next step in assembly.
  • Assistance putting together IKEA furniture, like the assembly line use case, but for home use.
  • Gaming, everything from Mario Kart Live to Light Saber duels against the infamous Darth Vader.
4 – Palm Reading and other Visual Analytics
  • Predictive weather patterns
5 – Visual Search through Search Engines and Proprietary Applications with Specific Knowledge Base Alignment
  • CoinSnap iPhone App scans both sides of the coin and then goes on to identify the coin, building a user’s collection.
  • Microsoft Bing’s Visual Search and Integration with MSFT Edge
  • Medical Applications, Leveraging AI, e.g., Image Models – Radiology
Radiology – Reading the Tea Leaves

Radiology builds a model of possible issues throughout the body. Creating images with specific types of fractures can empower the autodetection of any issues with the use of AI. If it was a non-proprietary model, radiologists worldwide could contribute to that AI model. The displacement of radiology jobs may inhibit the open non-proprietary nature of the use case, and the AI model may need to be built independently of open input from all radiologists.

Microsoft’s AI Builder – Detect Objects in Images

Microsoft’s AI model builder can help the user build models in minutes. Object Detection, Custom Model, Detect custom objects in images is the “template” you want to use to build a model to detect objects, e.g. people, cars, anything, rather quickly, and can enable users to add images (i.e. train model) to become a better model over time.

Many other AI Model types exist, such as Text Recognition within images. I suggest exploring the Azure AI Models list to fit your needs.

Current, Available Data Sources for Image Input

  • Current Device
  • SharePoint
  • Azure BLOB

Wish List for Data Sources w/Trigger Notifications

When a new image is uploaded into one of these data sources, a “trigger” can be activated to process the image with the AI Model and apply tags to the images.

  • ADT – video cam
  • DropBox
  • Google Drive
  • Instagram
  • Kodak (yeah, still around)
  • Meta/Facebook
  • OneDrive
  • Ring -video cam
  • Shutterfly
  • Twitter

Get Started: Power Automate, Premium Account

Login to Power Automate with your premium account, and select “AI Builder” menu, then the “Models” menu item. The top left part of the screen, select “New AI Model,” From the list of model types, select “Custom Model, Object Detection”Detect Custom Objects in Images.”

AI Builder - Custom Model
AI Builder – Custom Model

It’s a “Premium” feature of Power Automate, so you must have the Premium license. Select “Get Started”,. The first step is to “Select your model’s domain”, there are three choices, so I selected “Common Objects” to give me the broadest opportunity. Then select “Next”.

AI Builder - Custom Model - Domain
AI Builder – Custom Model – Domain

Next, you need to select all of the objects you want to identify in your images. For demonstration purposes, I added my family’s first names as my objects to train my model to identify in images.

AI Builder - Custom Model - Objects for Model
AI Builder – Custom Model – Objects for Model

Next, you need to “Add example images for your objects.” Microsoft’s guidance is “You need to add at least 15 images for each object you want to detect.” Current data sources include:

Add Images
AI Model – Add Images

I added the minimum recommended images, 15 per object, two objects, 30 images of my family, and random pics over the last year.

Once uploaded, you need to go through each image, draw a box around the image’s objects you want to tag, and then select the object tag.

Part 2 – Completing the Model and its App usage.

Amazon X-Ray Studios for Indie Movie Producers

I remember building a companion app for the Windows desktop that pulled music data from iTunes and Gracenote.   Gracenote boasts:

“Gracenote technology is at the heart of every great entertainment experience, and is supported by the largest source of music metadata on the planet..”

Gracenote, in conjunction with the iTunes API / data allowed me to personalize the user experience beyond what iTunes provided out of the box.   X-Ray IMDb on Amazon Video also enriches the experience of watching movies and television hosted on Amazon Video .

While watching a movie using Amazon Video, you can tap the screen, and get details about the specific scene, shown in the foreground as the media continues to play.

“Go behind the scenes of your favorite movies and TV shows with X-Ray, powered by IMDb.  Get instant access to cast photos, bios, and filmographies, soundtrack info, and trivia.  “

IMDb is an Amazon company, which in his infinite foresight, in 1998, Jeff Bezos, founder, owner and CEO of Amazon.com, struck a deal to buy IMDb outright for approximately $55 million and attach it to Amazon as a subsidiary, private company.

The Internet Movie Database (abbreviated IMDb) is an online database of information related to films, television programs and video games, including cast, production crew, fictional characters, biographies, plot summaries, trivia and reviews, operated by IMDb.com, Inc., a subsidiary of Amazon. As of June 2017, IMDb has approximately 4.4 million titles (including episodes), 8 million personalities in its database,[2] as well as 75 million registered users.


In Amazon’s infinite wisdom again, they are looking to stretch both X-Ray and the IMDb property to budding film artists looking to cultivate and mature their following.

Approach to Adoption of X-Ray IMDb  / Amazon Video

Amazon must empower artists and their representatives to update IMDb.  IMDBPro seems to enable just such capabilities such as:

Showcase yourself on IMDb & Amazon

Manage your photos and the credits you are Known For on IMDbPro, IMDb, and Amazon Video”
  1. How then is new media content, such as Actor’s photos, and Filmography [approved] and updated by IMDb.
  2. Furthermore, what is the selection process to get indie content [approved] and posted to Amazon video.  Is there a curation process whereby not every indie artist is hosted, e.g. creative selection process is driven by Amazon Video business.
  3. To expand the use of X-Ray powered by IMDb, what are the  options for alternate Media Players and Streamers?  e.g. is YouTube a possibility, hosting and streaming content embedded with X-Ray capabilities?  Does Amazon X-Ray enabled capabilities require the Amazon Video player?
X-Ray Current Support: Amazon Hosted and Streaming

X-Ray is available on the Amazon Video app in the US, UK, Germany, and Austria for thousands of titles on compatible devices including Amazon Fire Tablets and TV/Stick, iOS and Android mobile devices, and the web.    To access X-Ray, tap the screen or click on the Fire TV remote while the video is playing.”

Amazon X-Ray Studios, Video Editing/Integration Desktop Application

Indie producers may leverage X-Ray Studios to integrate IMDb overlay content to enhance their audience’s experience.   Timecodes are leveraged to sync up X-Ray content with the video content.

“In video production and filmmakingSMPTE timecode is used extensively for synchronization, and for logging and identifying material in recorded media. During filmmaking or video production shoot, the camera assistant will typically log the start and end timecodes of shots, and the data generated will be sent on to the editorial department for use in referencing those shots.”

All metadata regarding an Indie Video may be integrated into the video source / target file, and transcoding may be required to output the Amazon required media standard.

Amazon has slightly complicated the situation by creating an Amazon Web Service (AWS) called X-Ray which has completely no relation to the X-Ray service powered by IMDb.

Amazon could not be reached for comment.

Streaming Companies Provide their Platform to Content Creators

Streaming Platforms / Content Creators

Streaming companies enable content creators to use their well known, branded platforms to grow content author followings.  The reciprocal nature of the relationship creates an even broader customer base for streaming content platforms.

  • Direct competition with Google’s YouTube.
  • Microsoft to stretch SharePoint’s abilities re: video streaming, Video on Demand, and Broadcast Live, as well as it’s user licensing model, it could be another tool for entrepreneurs to offer any content creator a “Digital Entertainment Portal”.

Any content provider of digital media entertainment:

  • Broadcast Television Channels – e.g. CBS, NBC, SyFy
  • Independent digital media producers, e.g. currently using channels to reach a large audience, e.g. YouTube

The streaming company can create a portal wizard to build copy a streaming portal template.  The digital media producer uses web app widgets, similar to Microsoft SharePoint sites, to customize their portal to their digital media video/assets.  The Streaming “Portal” provider, as part of their service, handles the monetary transactions for customer subscriptions, or other business models supported.  In addition, the bandwidth load from streaming would be handled by the Streaming “Portal Provider”, a major benefit, leveraging the companies’ Content Delivery Network (CDN).

Anyone could apply for a partnership with the streaming company, and once approved, may use the tools provided by the streaming partner to spawn a new platform site around the customer/producer’s content.

This new revenue stream of streaming companies platforms, such as Netflix and Amazon Instant, may be vastly multiplied using a “Partner Portal” model.

Update 2/5/18

It seems that this path of content providers leveraging existing Portal Streaming companies has already begun:

  • Verizon FiOS embedding Netflix as a “Channel”
  • Amazon Prime (Prime Video) embedding CBS ALL ACCESS, HBO, STARZ, Showtime, Cinemax, etc. branded as “Amazon Channels”
    • Amazon has the capability to leverage their Amazon CloudFront (Highly secure global content delivery network (CDN))

At this juncture, no content on the “Indie” level being embedded in the Portal Streaming companies.  Looks like Google YouTube still monopolizes this space.

Since the original post date, CBS ALL ACCESS has been released, showing content providers, in addition to their own distribution channels direct to clients, will offer their content through 3rd party streaming portals as well…for now.  Maybe just for convenience because these streaming portals require subscriptions to the content provider in order to be served up.

Another post projecting the renaissance of streaming and content creation.

This post was from Dec 2014, but still very relevant today.

Media Companies (and Execs) in the Driver’s Seat for a Prosperous New Year

Google Introduces their Cloud, Digital Asset Management (DAM) solution

Although this is a saturated space, with many products, some highly recommended, I thought this idea might interest those involved in the Digital Asset Management space.  Based on the maturity of existing products, and cost, it’s up to you, build or buy.  The following may provide an opportunity for augmenting existing Google products, and overlaying a custom solution.

Google products can be integrated across their suite of solutions and may produce a cloud based, secure, Digital Asset Management, DAM solution.   In this use case, the digital assets are Media (e.g. videos, still images)

A Google DAM may be created by leveraging existing features of Google Plus, Google Drive, YouTube, and other Google products, as well as building / extending additional functionality, e.g. Google Plus API, to create a DAM solution.   An over arching custom framework weaves these products together to act as the DAM.

Google Digital Asset Management (New)

  1. A dashboard for Digital Asset Management should be created, which articulates, at a glance, where project media assets are in their life cycle, e.g. ingestion, transcoding, editing media, adding meta data, inclusion / editing of closed captions, workflow approvals, etc.
  2. Creation and maintenance of project asset folder structure within storage such as Google Drive for active projects as well as Google Cloud Storage for archived content.  Ingested content to arrive in the project folders.
  3. Ability to use [Google YouTube] default encoding / transcoding functionality, or optionally leverage alternate cloud accessible transcoding solutions.
  4. A basic DAM UI may provide user interaction with the project and asset meta data.
  5. Components of the DAM should allow plug in integration with other components on the  market today, such as an ingestion solution.

Google Drive and Google Cloud Storage.  Cloud storage offers large quantities of storage e.g. for Media (video, audio), economically.

  1. Google Drive ingestion of assets may occur through an automated process, such as a drop folder within an FTP site.  The folder may be polled every N seconds by the Google DAM orchestration, or other 3rd party orchestration product, and ingested into Google Drive.  The ingested files are placed into a project folder designated by the accompanying XML meta file.
  2. The version control of assets, implemented by Google Drive and the DAM to facilitate collaboration and approval.
  3. Distribution and publishing media to designated people and locations, such as to social media channels, may be automatically triggered by DAM orchestration polling Google Drive custom meta data changes.   On demand publishing is also achievable through the DAM.
  4. Archiving project assets to custom locations, such as Google Cloud solution, may be triggered by a project meta data status modification, or on demand through the DAM.
  5. Assets may be spawned into other assets, such as clips.  Derived child assets are correlated with the master, or parent asset within the DAM asset meta data to trace back to origin.  Eliminates redundancy of asset, enabling users to easily find related files and reuse all or a portion of the asset.

Google Docs

  1. Documents required to accompany each media project, such as production guidelines, may go through several iterations before they are complete.  Many of the components of a document may be static.  Google Docs may incorporate ‘Document Assembly’ technology for automation of document construction.

Google’s YouTube

  1. Editing media either using default YouTube functionality, or using third party software, e.g. Adobe suite
  2. Enable caption creation and editing  may use YouTube or third party software.
  3. The addition & modification of meta data according to the corporate taxonomy may be added or modified through [custom] YouTube fields, or directly through the Google DAM Db where the project data resides.

Google’s Google Plus +

  1. G+ project page may be used for project and asset collaboration
  2. Project team members may subscribe to the project page to receive notifications on changes, such as new sub clips
  3. Asset workflow notifications,  human and automated:
    1. Asset modification approvals (i.e. G+ API <- -> DAM Db) through custom fields in G + page
    2. Changes to assets (i.e. collaboration) notifications,
    3. [Automated] e.g. ingestion in progress, or completed updates.
    4. [Automated] Process notifications: e.g. ‘distribution to XYZ’ and ‘transcoding N workflow’.  G + may include links to assets.
  4. Google Plus for in-house, and outside org. team(s) collaboration
  5. G + UI may trigger actions, such as ingestion e.g.  by specifying a specific Google Drive link, and a configured workflow.

Google Custom Search

  1. Allows for the search of assets within a project, within all projects within a silo of business, and across entire organization of assets.
  2. Ability to find and share DAM motion pictures, still images, and text assets with individuals, groups, project teams in or outside the organization.  Google Plus to facilitate sharing.
  3. Asset meta data will e.g. describe how the assets may be used for distribution, digital distribution rights.   Users and groups are implemented within G+, control of asset distribution may be implemented in Google Plus, and/or custom Google Search.

Here are a list of DAM vendors.

Audio Hashtags Automatically Created When a User Uploads Podcasts or Videos

As mentioned in the post, BI Applied to YouTube Yields Value for Advertising, Marketing, & Sales, YouTube could dynamically scan audio for words and/or phrases when it processes the videos, and the most frequently used, or N threshold number of words appearing, or more to the point said in the video, would:

  1. Be recommended automatically as the words to used as the comma delimited list of words describing the video
  2. A hash tag cloud could appear on YouTube’s front page, with each word growing or shrinking based upon the current trends of words used in the videos uploaded.  People may then click or ‘lasso’ a single or several words and drill down to specific videos within a specific genre.
  3. Business intelligence can be applied in a search screen, to allow the user, advertiser or marketers, to find most frequently used words or phrases, the hip words, or trending phrases so they can use them in advertising or commercials.  For example, a search can be performed on comedy genre videos between X and Y dates, and the most used words and phrases can appear with number counts in a table.

The above intelligence, user interface, and technology may be applied to audio podcasts as well to help users find the type of podcasts they like to hear, or advertisers and marketers to target their audiences more specifically.

[dfads params=’groups=1177,1178&limit=1&orderby=random’]

BI Applied to YouTube Yields Value for Advertising, Marketing, & Sales

Google has scanned and indexed books, and showed the most commonly used words, and added meta data, based on book publishing year, genre, and so on.  It would be great to see that functionality come to YouTube.  Business Intelligence (BI) applied to video libraries yields profit for advertising, sales, and marketing.

Every video, in a batch process, gets analyzed for words used, as well as a word count per word, and there is meta data regarding the video, such as genre, and any other user provided information.  Then as videos get processed, a tag bubble cloud as the high level view shows dynamically the most commonly used words get bigger, and the relatively less appearing / said in the view get smaller.  Someone can then click on that word, and drill down to other information about the word, in a sense business intelligence about the word.  Advertisers, for example, may target certain classes, or word tags for their advertisements to appear.  Another profitable model may be to use this business intelligence for business presentations to understand what are the most frequently used words in business speeches, or presentations, based on a current time period, x and y dates, to track what are the buzz words, and even more granular, the business current buzz words of a specific sector, and global region.

Even the music industry can get into the act by getting ‘current phrases’ and incorporate them into their songs.

YouTube Video Streams from Google Glasses with Advertising? Behind the Scenes on DVDs?

A phenomena of reality T.V. makes me come to realize that this idea may be of interest to a wide audience. Although I am personally not interested in reality television, using an image video stabilizer, I was a bit jealous yesterday. Someone was headed to the Caribbean, and I was a bit jealous for a moment, but then I became relieved because although I like the destination, the journey annoys me, but to each his or her own.  I can imagine celebrities, musicians, and anyone can put on their Google Glasses, and stream their video feed to e.g. You Tube, either sell the stream, like a singer or drummer, or even an audience member of a concert will promote their Google Glasses Stream, and even sell advertising on the bottom as YouTube currently performs with YouTube videos.  A model walking down the ‘catwalk’, and the list goes on.  Even auction you’re stream if you are a celebrity for charity?   Man, reality T.V.   Can also give a whole new meaning to behind the scenes on movie DVDs, from a Director to whomever is in a movie, cast or crew.

 

Tablet Developers Make Business Intelligence Tools using Google as a Data Warehouse: Completing with Oracle, IBM, and Microsoft SQL Server

And, he shoots, and scores.  I called it, sort of.  Google came out of the closet today as a data warehouse vendor, at least they need a community of developers to connect the dots to help build an amazing Business Intelligence suite.

Google came out with a Google Docs API today, which using languages from Objective-C (iOS), C#, to Java so you can use Google as your Data Warehouse for any size business. All you need to do is write an ETL program which uploads and downloads tables from your local database to Google Docs, and you create your own Business Intelligence User Interface for the creation and viewing of Charts & Graphs.  It looks like they’ve changed strategies, or this was the plan all along.

Initially I thought that Google Fusion was going to be the table editing tool to manipulate your data that was transferred from your transactional database using the Google Docs API.  Today they released a Google Docs API and developers can create their own ETL drivers and a Business Intelligence User Interface that can run on any platform from an Android Tablet, iPad, or Windows Tablet.

A few days ago, I wrote the article, which looked like they were going to use a tool called Google Fusion, which was in Beta at the time to manipulate tabular data, and eventually extend it to create common BI components, such as graphs, charts, edit tables, etc.

A few gotchas: Google Docs on Apple iPad is version 1.1.1 released 9/28/12, so we are talking very early days, and the Google Docs API was released today.   I would imagine since you can also use C#, someone can make a Windows application on the desktop to manipulate the data tables, create and view graphs, so a Windows Tablet can be used.  The API also has Java compatibility, so from any Unix box, or any platform, Java is write once, run anywhere, wherever your transitional database lives, a developer is able to write a driver to transfer the data to Google Docs dynamically, and then use Google Docs API for Business Intelligence.  You can even write an ETL driver which all it does is rapidly transfer data, like an ODBC, or JDBC driver and use any business intelligence tools you have on your desktop, or a nightly ETL.  However, I can see developers creating business intelligence tools on Android, iPad, or Windows tables to modify tables, create and view charts, etc., using custom BI tool sets and their data warehouse now becomes Google Docs.

Please reference an article I wrote a few days back, “Google is Going to be the Next Public and Private Data Warehouse“.

At that time, Google Fusion was marked as Beta on 10/13/2012.  Google has since stripped off the word Beta, but doesn’t matter.  Its even better with the Google API to Google Docs.  Google Fusion could be your starter User Interface, however, if your Android, iOS (Apple iPad), and Windows developers really embrace this API, all of the big database companies like IBM, Oracle, and Microsoft may have their market share eroded to some extent, if not a great extent.

Update 10/19:

Hey Gs (Guys and Gals), I forgot to mention, you can also make your own video or music streaming applications perhaps, using the basic calls of get and receive file other companies are already doing such as AWS, Box, etc. It’s a simple get / send API, so not sure if it’s applicable to ‘streaming’ at this stage, just another storage location in the ‘cloud’, which would be quite boring.  Although thinking of it now, aren’t all the put / send cloud solutions potential data warehouses using ETL and the APIs discussed and published above?  Also, it’s ironic that Google would also be competing with itself, if it was a file share, ‘stream’ videos, and YouTube?