The problem is more widespread then highlighted in the article. It’s not just these high profile companies using “public domain” images to annotate with facial recognition notes and training machine learning (ML) models. Anyone can scan the Internet for images of people, and build a vast library of faces. These faces can then be used to train ML models. In fact, using public domain images from “the Internet” will cut across multiple data sources, not just Flickr, which increases the sample size, and may improve the model.
The rules around the uses of “Public Domain” image licensing may need to be updated, and possibly a simple solution, add a watermark to any images that do not have permission to be used for facial recognition model training. All image processors may be required to include a preprocessor to detect the watermark in the image, and if found, skip the image from being included in the training of models.
about deconstructing existing functionality of entire Photo Archive and Sharing platforms.
It is…
to bring an awareness to the masses about corporate decisions to omit the advanced capabilities of cataloguing photos, object recognition, and advanced metadata tagging.
Backstory: The Asks / Needs
Every day my family takes tons of pictures, and the pictures are bulk loaded up to The Cloud using Cloud Storage Services, such as DropBox, OneDrive, Google Photos, or iCloud. A selected set of photos are uploaded to our favourite Social Networking platform (e.g. Facebook, Instagram, Snapchat, and/or Twitter).
Every so often, I will take pause, and create either a Photobook or print out pictures from the last several months. The kids may have a project for school to print out e.g. Family Portrait or just a picture of Mom and the kids. In order to find these photos, I have to manually go through our collection of photographs from our Cloud Storage Services, or identify the photos from our Social Network libraries.
Social Networking Platform Facebook
As far as I can remember the Social Networking platform Facebook has had the ability to tag faces in photos uploaded to the platform. There are restrictions, such as whom you can tag from the privacy side, but the capability still exists. The Facebook platform also automatically identifies faces within photos, i.e. places a box around faces in a photo to make the person tagging capability easier. So, in essence, there is an “intelligent capability” to identify faces in a photo. It seems like the Facebook platform allows you to see “Photos of You”, but what seems to be missing is to search for all photos of Fred Smith, a friend of yours, even if all his photos are public. By design, it sounds fit for the purpose of the networking platform.
Auto Curation
Automatically upload new images in bulk or one at a time to a Cloud Storage Service ( with or without Online Printing Capabilities, e.g. Photobooks) and an automated curation process begins.
The Auto Curation process scans photos for:
“Commonly Identifiable Objects”, such as #Car, #Clock, #Fireworks, and #People
Auto Curation of new photos, based on previously tagged objects and faces in newly uploaded photos will be automatically tagged.
Once auto curation runs several times, and people are manually #taged, the auto curation process will “Learn” faces. Any new auto curation process executed should be able to recognize tagged people in new pictures.
Auto Curation process emails / notifies the library owners of the ingestion process results, e.g. Jane Doe and John Smith photographed at Disney World on Date / Time stamp. i.e. Report of executed ingestion, and auto curation process.
Manual Curation
After upload, and auto curation process, optionally, it’s time to manually tag people’s faces, and any ‘objects’ which you would like to track, e.g. Car aficionado, #tag vehicle make/model with additional descriptive tags. Using the photo curator function on the Cloud Storage Service can tag any “objects” in the photo using Rectangle or Lasso Select.
Curation to Take Action
Once photo libraries are curated, the library owner(s) can:
Automatically build albums based one or more #tags
Smart Albums automatically update, e.g. after ingestion and Auto Curation. Albums are tag sensitive and update with new pics that contain certain people or objects. The user/ librarian may dictate logic for tags.
Where is this Functionality??
Why are may major companies not implementing facial (and object) recognition? Google and Microsoft seem to have the capability/size of the company to be able to produce the technology.
Is it possible Google and Microsoft are subject to more scrutiny than a Shutterfly? Do privacy concerns at the moment, leave others to become trailblazers in this area?
The ultimate goal, in my mind, is to have the capability within a Search Engine to be able to upload an image, then the search engine analyzes the image, and finds comparable images within some degree of variation, as dictated in the search properties. The search engine may also derive metadata from the uploaded image such as attributes specific to the image object(s) types. For example, determine if a person [object] is “Joyful” or “Angry”.
As of the writing of this article, search engines Yahoo and Microsoft Bing do not have the capability to upload an image and perform image/pattern recognition, and return results. Behold, Google’s search engine has the ability to use some type of pattern matching, and find instances of your image across the world wide web. From the Google Search “home page”, select “Images”, or after a text search, select the “Images” menu item. From there, an additional icon appears, a camera with the hint text “Search by Image”. Select the Camera icon, and you are presented with options on how Google can acquire your image, e.g. upload, or an image URL.
Select the “Upload an Image” tab, choose a file, and upload. I used a fictional character, Max Headroom. The search results were very good (see below). I also attempted an uncommon shape, and it did not meet my expectations. The poor performance of matching this possibly “unique” shape is mostly likely due to how the Google Image Classifier Model was defined, and correlating training data that tested the classifier model. If the shape is “Unique” the Google Search Image Engine did it’s job.
Google Image Search Results – Max Headroom
Google Image Search Results – Odd Shaped Metal Object
The Google Search Image Engine was able to “Classify” the image as “metal”, so that’s good. However I would have liked to see better matches under the “Visually Similar Image” section. Again, this is probably due to the image classification process, and potentially the diversity of image samples.
A Few Questions for Google
How often is the Classifier Modeling process executed (i.e. training the classifier), and the model tested? How are new images incorporated into the Classifier model? Are the user uploaded images now included in the Model (after model training is run again)? Is Google Search Image incorporating ALL Internet images into Classifier Model(s)? Is an alternate AI Image Recognition process used beyond Classifier Models?
I’m not sure if the Cloud Vision API uses the same technology as Google’s Search Image Engine, but it’s worth noting. After reaching the Cloud Vision API starting page, go to the “Try the API” section, and upload your image. I tried a number of samples, including my odd shaped metal, and I uploaded the image. I think it performed fairly well on the “labels” (i.e. image attributes)
Using the Google Cloud Vision API, to determine if there were any WEB matches with my odd shaped metal object, the search came up with no results. In contrast, using Google’s Search Image Engine produced some “similar” web results.
Finally, I tested the Google Cloud Vision API with a self portrait image. THIS was so cool.
The API brought back several image attributes specific to “Faces”. It attempts to identify certain complex facial attributes, things like emotions, e.g. Joy, and Sorrow.
The API brought back the “Standard” set of Labels which show how the Classifier identified this image as a “Person”, such as Forehead and Chin.
Finally, the Google Cloud Vision API brought back the Web references, things like it identified me as a Project Manager, and an obscure reference to Zurg in my Twitter Bio.
The Google Cloud Vision API, and their own baked in Google Search Image Engine are extremely enticing, but yet have a ways to go in terms of accuracy %. Of course, I tried using my face in the Google Search Image Engine, and looking at the “Visually Similar Images” didn’t retrieve any images of me, or even a distant cousin (maybe?)
NFC (Near Field Communications) has significant potential in the transfer of information, and has already proven to be a lightweight technology to transfer and store data. We have already seen at this year’s CES conference business cards enable the transfer of songs from an NFC enabled business card to a car radio. Samsung has enabled this technology in their smartphones to transfer data such as videos and pictures.
There will come a day soon where we will have built in storage in a device, such as a picture frame, or television, and the NFC card will allow the transfer of information to this temporary buffer in the device for playing music, watching videos, or looking at pictures. This day is not far off. Yes, those LCD picture frames in your home that take SD memory are outdated.
Apple made an acquisition of a company that has the ability to enable an LCD touch screen to raise a keyboard through the touch screen, so the user has the tactile contact of the keyboard. We may go back to typing on the keyboard without looking, like we do with smartphones with keyboards. I envision an art gallery that has huge LCD screens all around the room, and switching an artist on display would be as easy as walking over to each LCD picture frame and taping the frame enabled with this raised, tactile LCD technology. In the artist’s creation, the paint of the brushstrokes may appear raised from the LCD canvas, with a three dimensional effect on the picture frame. An artist making an art creation would make brush strokes using a digital brush, pressing like you would on a canvas, choosing the appropriate paint may record the additional information required to display a three dimensional painting.
Picture that.
Addendum:
After additional research, the one inhibitor, which may pose a significant barrier, and provides optimal data transfer of smaller data packets.
The maximum data transfer rate of NFC (424 kbit/s) is slower than that of Bluetooth V2.1 (2.1 Mbit/s), as noted in Wikipedia.
The speed of MicroSD Speed Class 10 is 10 MB/sec, significantly greater, as well as the advanced UHS, or Ultra High Speed Class, UHS-I has a 50 MB/s, and UHS-II has a theoretical maximum transfer rate of 312 MB/s.
Although, the idea of NFC, or Bluetooth for the matter, has a conceptual idea of tap and transfer high rates for large data to internal memory buffers in devices, the reality is that the WiFi connectivity speeds outweigh both NFC and Bluetooth, and MicroSD, physical medium outweighs NFC / Bluetooth. If this idea had merit today, you would need to apply a WiFi connected device to get the maximum throughput without physical media, such as secure digital, or continue to leverage physical media for transfer and still use the memory buffer as a temporary storage in devices, as noted in the article.
Google Plus mobile should have the stylistic capabilities of Instagram to apply to their mobile photos, on Android, for example. At this point, you cannot apply the styles to photos as you can on Facebook / Instagram, and post photos to Google Plus.
This post applies to any digital media platform that distributes news articles, books, music, movies, and more.
As I was looking online at a New York Times article, when I scrolled to the bottom of the screen, a popup appeared and told me I had 9 of 10 free articles left for the month, and I thought that was brilliant. As digital media becomes more competitive, and the content on the platform varies, regardless if it’s the pay as you go model; trial, with unlimited after trial; or free until max per month or week as the lure; all companies need to allow their clients or potential clients to see how they are using the digital media platform’s products.
As an example, I would like to see what percentage of Technology articles I am viewing per day, week, or month verses Business articles for a certain periodical, and then I can make an informed decision regarding which periodicals I choose to subscribe to for business and also for Technology. Maybe digital media companies will evolve to have mixed business models, such as, pay per consumption option for all articles after free until max, then for select sections, such as Business or Technology, they may offer unlimited option for the Business, and eventually even a particular editor of Op-Ed pieces. It could be a price that is significantly less then getting the whole periodical, but at least you are able to attract consumers that have been less willing to go for the full paper, and don’t want the hassle of a pay per go, or monthly chargeback per use model.
If I want to choose a magazine for photography, and I am into archeology from a specific region, as a perspective buyer, I might want to know from the publisher’s entire content, and not just what I have read, a drill down pie chart of subject matters for all photos, and then after I selected Archeology, what percentage of those articles are from a particular region, a subject, and then a photographer. This is also a powerful business intelligence tool for existing consumers, and may give you a competitive edge. Also, alliances, that are able to partner for other content, index, and transform that content, say using NewsML–G2, and then perform sharing margin and chargeback. The lure to their portal would be the driver for the competition as well as the vast of content, and partnerships.
A Note for Advertisers
There are other forms of Business Intelligence for your digital medial consumption that can be offered, such as indexed content, text, images, and video. You can not only capture image descriptions, and objects within a video to be indexed, which can be used for advertisers to see what the demographics of consumers are watching videos with the most sneakers, or smartphones, and descriptions that may include dancing clowns. This may assist the small to mid side startup digital advertiser to understand the consumers in their target markets, and abstract the data.
So, the obvious thing here is Facebook wants to kill Off Instagram. You think core users, won’t care, possibly? Is it even legal, their new terms of service, questionable, with adding the bit regarding minors being included. So why buy Instagram, and put outrageous terms to the very popular service. One reason might be a common tale, where a suitor company will buy what it projects has high market share they are already, or plan on getting into, to allow them to grab mass market share. The suitor company may already be in the market, and simply can capitalize on their resources, e.g. staff, technology, and then try to run the ship aground, i.e. sabotage. demoting the acquired company by putting a poor taste in the customers path, and the original suitor company offers an alternate path, which attracts the customer base to convert. Some of the articles in the New York Times, What Instagram’s New Terms of Service Mean for You and an a Mashable OP-ED piece, Instagram Will Basically Sign Your Life Away imply picking up your pitchforks and rally us around Instagram, and apply a crowd mentality to trample yourself away from Instagram.
If this is the Facebook / Instagram business model, as these folk are interpreting the requirements, I am not so personally keen on my daughter using Instagram, and her showing up in an advertisement, as I think I read this bit from the interpreted TOS. I don’t think the kid would be too keen either, probably for a different reason then her Father. Advertisements can be taken out of context, or you may loose control of how your face is integrated with a product or service, and might not necessarily agree with its use. Talk about your type-casting. A teen shows up in an advertisement for acne, she doesn’t know about the advertisement until it’s posed on her locker, and this is a relatively innocent example.. In addition, a capitalistic kid would say, “I am not particularly keen on my face, or pictures showing up somewhere without my permission, but hey, where is my cut.”
There are already established platforms that sell photographer’s photos through established licencing models, and sure, that may be another more viable model for the Facebook / Instagram folks, but hey, I am just a man with a keyboard, and half a brain.
The Product Owner (PO) is a member of the Agile Team responsible for defining Stories and prioritizing the Team Backlog to streamline the execution of program priorities while maintaining the conceptual and technical integrity of the Features or components for the team.