Review of Microsoft OneDrive Cloud Repository. It may be an easy tool and service(s) to save files. If you know what you roughly want to find, most Cloud repositories are easy and straight forward to use. Over time, if not managed appropriately, the cloud repository becomes burdensome to manage, e.g. access and find files. If stuck in the “file folder organization storage” mentality of organizing our content, our Cloud storage solution will become quickly unyielding. Getting into habits like tagging your content should help us to access files beyond the “Folder Borders”. To the contrary, there are huge opportunities to leverage and grow existing platforms, specifically around the process service of [file] Ingestion.
Bulk file loading, e.g. photos from our smartphones, maybe the entire family uploads to the same storage repository
If performed by the “Ingestion Service”, manual user “tagging” of a group of photos, or individual images may be available.
Geotagging may be available either at the time of image capture , or upon the start of the “Ingestion Service”
Facial Recognition, compared to the likes of services such as Facebook, based on my experience, are not readily available to personal Cloud Storage repositories.
Auto tagging pictures upon ingestion, if performed, may leverage “Extracted Text” from images. Images become searchable with little human intervention.
Cloud File Repository: Storing Content
I created modified existing Microsoft Office files”tags”, in this case MS Word and PowerPoint file types were used. I opened the Word file, and selected “File” menu, “Save As” menu, then “More Options” under the list of file types. I was then presented with the classic “Save As” form. Just below the “Save as type” list box, there were 3 “metadata” fields to describe the file:
The first two fields are semi colon ; delimited and multiple values are allowed. In this test case, I added to the “Tags” field “CV;resume;career”. I then used the MS Windows Snipping Tool that comes with the OS to document the step. I called the file MSWordTags.PNG and saved this screen capture to my OneDrive. Then I saved the document itself on my OneDrive.
Cloud File Repository: Finding Content
I then started up Internet Explorer, and went to the https://onedrive.live.com site to access my cloud content. On the top left corner of the screen, there is a field called “Search Everything”, and I typed in CV.
The search results included ONLY the image screenshot file that contained the letters CV, and not the MS Word file that explicitly had the Tag field with the text value CV.
Looking at the file properties as defined by OneDrive, there was ALSO a field called “Tags” with no values populated. For example, the Cloud “Ingestion” service did not read the file for metadata, and abstract it to the Cloud level. just two separate sets of metadata describing the same file. To view the Cloud file data, select the file, and there is an i with a circle around it. Too many ways to store the same data, and may lead to inconsistent data.
For the Cloud file information / properties, the image file had a field called “Extracted Text”, and this is how the search picked up the CV value in the Cloud Search for my files with the “CV” tag.
Oddly, the MS Word file attributes in OneDrive did not offer “tags” as a field to store meta data in the cloud. The “tags” field was available when looking at the PNG file. However, the user may add a “Description” in a multiline text field. Tags metadata on images and not MS Word files? Odd.
Future State (?): If the Cloud Ingestion process can perform an “Extracted Text” process, it may also have other “Ingestion services”, such as “Facial Recognition” from “known good” faces already tagged. e.g. I tag a face from within the OneDrive browser UI, and now when other images are ingested, there can be a correlation between the files.
As a business model, are we going to add a tier just after Cloud File ingestion, maybe exercise a third party suite of cognitive APIs, such as facial recognition? For example, Microsoft OneDrive Ingests a file, and if it’s an image file, routes through to the appropriate IBM Watson API, processes the file, and returns [updated] metadata, and a modified file? Maybe.
Update: Auto Tagging Objects Upon Ingestion
On an image with no tags, I selected the “Edit tags” menu from the Properties pane on the right side of the screen. As a scrolling menu, the option to “Add existing tag” appeared. There were dozens of tags already created with a word, thumbnail image, and the number of times used. Wow. Awesome. The current implementation seems to automatically, upon ingestion, identify objects in the image, and tag the images with those objects, e.g. Building, Beach, Horse, etc.
Presumption that Microsoft OneDrive performs object recognition on images upon file ingestion into the cloud (as opposed to in the Photos app).
“Extracted Text ” Metadata Field from within Microsoft OneDrive Image PNG File Properties:
Presumption that Microsoft OneDrive performs OCR on images upon file ingestion into the cloud (as opposed to the Photos app).
Is there value in providing users the ability to apply “Time Lock Access” to files in cloud storage? Files are securely uploaded by their Owner. After upload no one, including the Owner, may access / open the file(s). Only after the date and time provided for the time lock passes, files will be available for access, and action may be taken, e.g. Automatically email a link to the files. More complex actions may be attached to the time lock release such as script execution using a simple set of rules as defined by the file Owner.
Solution already exists? Please send me a link to the cloud integration product / plug in.
Are you trying to apply metadata on individual files or en masse, attempting to make the vast growth of cloud storage usage manageable, meaningful storage?
Best practices leverage a consistent hierarchy, an Information Architecture in which to store and retrieve information, excellent.
Beyond that, capabilities computer science has documented and used time and time again, checksum algorithms. Used frequently after a file transfer to verify the file you requested is the file you received. Most / All Enterprise DAM solutions use some type of technology to ‘allow’ the enforcement of unique assets [upon upload]. In cloud storage and photo solutions targeted toward the individual, consumer side, the feature does not appear to be up ‘close and personal’ to the user experience, thus building a huge expanse of duplicate data (documents, photos, music, etc.). Another feature, a database [primary] key has been used for decades to identify that a record of data is unique.
Our family sharing alone has thousands and thousands of photos and music. The names of the files could be different for many of the same digital assets. Sometimes file names are the same, but the metadata between the same files is not unique, but provides value. Tools for ‘merging’ metadata, DAM tools have value to help manage digital assets.
Cloud storage usage is growing exponentially, and metadata alone won’t help rope in the beast. Maybe ADHOC or periodic indexing of files [e.g. by #checksum algorithm] could take on the task of identifying duplicate assets? Duplicate assets could be viewed by the user in an exception report? Less boring, upon upload, ‘on the fly’ let the user know the asset is already in storage, and show a two column diff. of the metadata.
It’s a pain for me, and quite possibly many cloud storage users. As more people jump on cloud storage, this feature should be front and center to help users grow into their new virtual warehouse.
The industry of cloud storage most likely believes for the common consumer, storage is ‘cheap’, just provide more. At some stage, the cloud providers may look to DAM tools as the cost of managing a users’ storage rises. Tools like:
duplicate digital assets, files. Use exception reporting to identify the duplicates, and enable [bulk] corrective action, and/or upon upload, duplicate ‘error/warning’ message.
Dynamic metadata tagging upon [bulk] upload using object recognition. Correlating and cataloging one or more [type] objects in a picture using defined Information Architecture. In addition, leveraging facial recognition for updates to metadata tagging.
e.g. “beach” objects: sand, ocean; [Ian Roseman] surfing;
Brief questionnaires may enable the user to ‘smartly’ ingest the digital assets; e.g. ‘themes’ of current upload; e.g. a family, or relationship tree to extend facial recognition correlations.
e.g. themes – summer; party; New Year’s Eve
e.g. relationship tree – office / work
Pan Information Architecture (IA) spanning multiple cloud storage [silos]. e.g. for Photos, spanning [shared] ‘albums’
Publically published / shared components of an IA; e.g. Legal documents; standards and reuse