13 Colonies Flag

Help Wanted: Civil War Reenactment Soldiers to Improve AI Models

I just read an article on Digital PC Magazine, “Human Help Wanted: Why AI Is Terrible at Content Moderation” which started to get my neurons firing.

Problem Statement

Every day, Facebook’s artificial intelligence algorithms tackle the enormous task of finding and removing millions of posts containing spam, hate speech, nudity, violence, and terrorist propaganda. And though the company has access to some of the world’s most coveted talent and technology, it’s struggling to find and remove toxic content fast enough.

Ben Dickson
July 10, 2019 1:36PM EST

I’ve worked at several software companies which leveraged Artifical Intelligence, Machine Learning to recognize patterns, correlations. The larger the data sets, in general, the higher the accuracy of the predictions. The outliers in the data, the noise, “falls out” of the data set. Without quality, large training data, Artificial Intelligence makes more mistakes.

In terms of speech recognition, image classification, and natural language processing (NLP), in general, programs like chatbots, digital assistants, are becoming more accurate because of their sample size, training data sets are large, and there is no shortage of these data types. For example, there are many ways I can ask my digital assistant for something, like “Get the movie times”. Training a digital assistant, at a high level, would be to catalog how many ways can I ask for “something”, achieve my goal. I can go and create that list. I could write a few dozen questions, but still, my sample data set would be too small. Amazon has a crowdsourcing platform, Amazon Mechanical Turk, which I can request they build me the data sets, thousands of questions, and correlated goals.

MTurk enables companies to harness the collective intelligence, skills, and insights from a global workforce to streamline business processes, augment data collection and analysis, and accelerate machine learning development.

Amazon Mechanical Turk: Access a global, on-demand, 24×7 workforce

Video “Scene” Recognition – Annotated Data Sets for a Wide Variety of Scene Themes

In silent films, the plot was conveyed by the use of title cards, written indications of the plot and key dialogue lines. Unfortunately, silent films are not making a comeback. In order to achieve a high rate of successful identification of activities within a given video clip, video libraries of metadata need to be created, that capture:

  • Media / Video Asset, Unique Identifier
  • Scene Clip IN and OUT timecodes
  • Scene Theme(s), similar to Natural language processing (NLP), Goals = Utterances / Sentences
    • E.g. Man drinking water; Woman playing Tennis
  • Image recognition, in the context of machine vision, is the ability of software to identify objects, places, people, writing and actions in images. Image recognition is used to perform a large number of machine-based visual tasks, such as labeling the content of images with meta-tags

Not Enough Data

Here is an example of how Social Media, such as Facebook, attempts to deal with video deemed inappropriate for their platform:

In March, a shooter in New Zealand live-streamed the brutal killing of 51 people in two mosques on Facebook. But the social-media giant’s algorithms failed to detect the gruesome video. It took Facebook an hour to take the video down, and even then, the company was hard-pressed to deal with users who reposted the video.

Ben Dickson
July 10, 2019 1:36PM EST

…in many cases, such as violent content, there aren’t enough examples to train a reliable AI model. “Thankfully, we don’t have a lot of examples of real people shooting other people,” Yann LeCun, Facebook’s chief artificial-intelligence scientist, told Bloomberg.

Ben Dickson
July 10, 2019 1:36PM EST

Opportunities for Actors and Curators of Video Content: Dramatizations

All those thousands of people who perform, creating videos of content that range the gamut from playing video games to “unboxing” collectible items. The actors who perform dramatizations could add tags to their videos indicating as per above, documenting themes for a given skit. If actors post their videos on YouTube or proprietary crowdsourcing platforms, they would be entitled to some revenue for the use of their licensed video.

Disclosure Regarding Flag Controversy

I now realize there are politics around Nike “tipping their hat” toward the Betsy Ross flag. However, when I referenced the flag in this blog post, I was thinking of the American Revolution, and the 13 colonies flag. I didn’t think the title would resonate with readers, “Help Wanted: Amerian Revolutionary war Reenactment Soldiers to Improve AI Models.”, so I took some creative liberty.

Leave a Reply