Tag Archives: onedrive

Microsoft Productivity Suite – Content Creation, Ingestion, Curation, Search, and Repurpose

Auto Curation: AI Rules Engine Processing

There are, of course, 3rd party platforms that perform very well, are feature rich, and agnostic to all file types.  For example, within a very short period of time, low cost, and possibly a few plugins, a WordPress site can be configured and deployed to suit your needs of Digital Asset Managment (DAM).  The long-term goal is to incorporate techniques such as Auto Curation to any/all files, leveraging an ever-growing intelligent taxonomy, a taxonomy built on user-defined labels/tags, as well an AI rules engine with ML techniques.   OneDrive, as a cloud storage platform, may bridge the gap between JUST cloud storage and a DAM.

Ingestion and Curation Workflow

Content Creation Apps and Auto Curation

  • The ability for Content Creation applications, such as Microsoft Word, to capture not only the user-defined tags but also the context of the tags relating to the content.
    • When ingesting a Microsoft PowerPoint presentation, after consuming the file, and Auto Curation process can extract “reusable components” of the file, such as slide header/name, and the correlated content such as a table, chart, or graphics.
    • Ingesting Microsoft Excel and Auto Curation of Workbooks may yield “reusable components” stored as metadata tags, and their correlated content, such as chart and table names.
    • Ingesting and Auto Curation of Microsoft Word documents may build a classic Index for all the most frequently occurring words, and augment the manually user-defined tags in the file.
    • Ingestion of Photos [and Videos] into and Intelligent Cloud Storage Platform, during the Auto Curation process, may identify commonly identifiable objects, such as trees or people.  These objects would be automatically tagged through the Auto Curation process after Ingestion.
  • Ability to extract the content file metadata, objects and text tags, to be stored in a standard format to be extracted by DAMs, or Intelligent Cloud Storage Platforms with file and metadata search capabilities.  Could OneDrive be that intelligent platform?
  • A user can search for a file title or throughout the Manual and Auto Curated, defined metadata associated with the file.  The DAM or Intelligent Cloud Storage Platform provides both search results.   “Reusable components” of files are also searchable. 
    • For “Reusable Components” to be parsed out of the files to be separate entities, a process needs to occur after Ingestion Auto Curration.
  • Content Creation application, user-entry tag/text fields should have “drop-down” access to the search index populated with auto/manual created tags.

Auto Curation and Intelligent Cloud Storage

  • The intelligence of Auto Curation should be built into the Cloud Storage Platform, e.g. potentially OneDrive.
  • At a minimum, auto curation should update the cloud storage platform indexing engine to correlate files and metadata.
  • Auto Curation is the ‘secret sauce’ that “digests” the content to build the search engine index, which contains identified objects (e.g. tag and text or coordinates)  automatically
    • Auto Curation may leverage a rules engine (AI) and apply user configurable rules such as “keyword density” thresholds
    • Artificial Intelligence, Machine Learning rules may be applied to the content to derive additional labels/tags.
  • If leveraging version control of the intelligent cloud storage platform, each iteration should “re-index” the content, and update the Auto Curation metadata tags.  User-created tags are untouched.
  • If no user-defined labels/tags exist, upon ingestion, the user may be prompted for tags

Auto Curation and “3rd Party” Sources

In the context of sources such as a Twitter feed, there exists no incorporation of feeds into an Intelligent Cloud Storage.  OneDrive, Cloud Intelligent Storage may import feeds from 3rd party sources, and each Tweet would be defined as an object which is searchable along with its metadata (e.g. likes; tags).

Operating System, Intelligent Cloud Storage/DAM

The Intelligent Cloud Storage and DAM solutions should have integrated search capabilities, so on the OS (mobile or desktop) level, the discovery of content through the OS search of tagged metadata is possible.

Current State

  1. OneDrive has no ability to search Microsoft Word tags
  2. The UI for all Productivity Tools must have a comprehensive and simple design for leveraging an existing taxonomy for manual tagging, and the ability to add hints for auto curation
    1. Currently, Microsoft Word has two fields to collect metadata about the file.  It’s obscurely found at the “Save As” dialog.
      1. The “Save As” dialogue box allows a user to add tags and authors but only when using the MS Word desktop version.  The Online (Cloud) version of Word has no such option when saving to Microsoft OneDrive Cloud Storage
  3. Auto Curation (Artificial Intelligence, AI) must inspect the MS Productivity suite tools, and extract tags automatically which does not exist today.
  4. No manual taging or Auto Curation/Facial Recognition exists.

Cloud Storage: Ingestion, Management, and Sharing

Cloud Storage Solutions need differentiation that matters, a tipping point to select one platform over the other.

Common Platforms Used:

Differentiation may come in the form of:

  • Collaborative Content Creation Software, such as DropBox Paper enables individuals or teams to produce content, all the while leveraging the Storage platform for e.g. version control,
  • Embedded integration in a suite of content creation applications, such as Microsoft Office, and OneDrive.
  • Making the storage solution available to developers, such as with AWS S3, and Box.  Developers may create apps powered by the Box Platform or custom integrations with Box
  • iCloud enables users to backup their smartphone, as well tightly integrating with the capture and sharing of content, e.g. Photos.

Cloud Content Lifecycle Categories:

  • Content Creation
    • 3rd Party (e.g. Camera) or Integrated Platform Products
  • Content Ingestion
    • Capture Content and Associated Metadata
  • Content Collaboration
    • Share, Update and Distribution
  • Content Discovery
    • Surface Content; Searching and Drill Down
  • Retention Rules
    • Auto expire pointer to content, or underlying content

Cloud Content Ingestion Services:

Cloud Ingestion Services
Cloud Ingestion Services

Time Lock Access: Seal Files in Cloud Storage

Is there value in providing users the ability to apply “Time Lock Access” to files in cloud storage?  Files are securely uploaded by their Owner.  After upload no one, including the Owner, may access / open the file(s).   Only after the date and time provided for the time lock passes, files will be available for access, and action may be taken, e.g.  Automatically email a link to the files.  More complex actions may be attached to the time lock release such as script execution using a simple set of rules as defined by the file Owner.

Solution already exists?  Please send me a link to the cloud integration product / plug in.

Cloud Storage and DAM Solutions: Don’t Reign in the Beast

Are you trying to apply metadata on individual files or en masse, attempting to make the vast  growth of cloud storage usage manageable, meaningful storage?

Best practices leverage a consistent hierarchy, an Information Architecture in which to store and retrieve information, excellent.

Beyond that, capabilities computer science has documented and used time and time again, checksum algorithms. Used frequently after a file transfer to verify the file you requested is the file you received.  Most / All Enterprise DAM solutions use some type of technology to ‘allow’ the enforcement of unique assets [upon upload].  In cloud storage and photo solutions targeted toward the individual, consumer side, the feature does not appear to be up ‘close and personal’ to the user experience, thus building a huge expanse of duplicate data (documents, photos, music, etc.).  Another feature, a database [primary] key has been used for decades to identify that a record of data is unique.

Our family sharing alone has thousands and thousands of photos and music. The names of the files could be different for many of the same digital assets.  Sometimes file names are the same, but the metadata between the same files is not unique, but provides value. Tools for ‘merging’ metadata, DAM tools have value to help manage digital assets.

Cloud storage usage is growing exponentially, and metadata alone won’t help rope in the beast. Maybe ADHOC or periodic indexing of files [e.g. by #checksum algorithm] could take on the task of identifying duplicate assets?  Duplicate  assets could be viewed by the user in an exception report?  Less boring, upon upload, ‘on the fly’ let the user know the asset is already in storage, and show a two column diff. of the metadata.

It’s a pain for me, and quite possibly many cloud storage users.  As more people jump on cloud storage, this feature should be front and center to help users grow into their new virtual warehouse.

The industry of cloud storage most likely believes for the common consumer, storage is ‘cheap’, just provide more.  At some stage, the cloud providers may look to DAM tools as the cost of managing a users’ storage rises.  Tools like:

  • duplicate digital assets, files. Use exception reporting to identify the duplicates, and enable [bulk] corrective action, and/or upon upload, duplicate ‘error/warning’ message.
  • Dynamic metadata tagging upon [bulk] upload using object recognition.  Correlating and cataloging one or more [type] objects in a picture using defined Information Architecture.  In addition, leveraging facial recognition for updates to metadata tagging.
    • e.g. “beach” objects: sand, ocean; [Ian Roseman] surfing;
  • Brief questionnaires may enable the user to ‘smartly’ ingest the digital assets; e.g. ‘themes’ of current upload; e.g. a family, or relationship tree to  extend facial recognition correlations.
    • e.g. themes – summer; party; New Year’s Eve
    • e.g. relationship tree – office / work
  • Pan Information Architecture (IA) spanning multiple cloud storage [silos]. e.g. for Photos, spanning [shared] ‘albums’
  • Publically published / shared components of an IA;  e.g. Legal documents;  standards and reuse