Tag Archives: Storage

Microsoft Productivity Suite – Content Creation, Ingestion, Curation, Search, and Repurpose

Auto Curation: AI Rules Engine Processing

There are, of course, 3rd party platforms that perform very well, are feature rich, and agnostic to all file types.  For example, within a very short period of time, low cost, and possibly a few plugins, a WordPress site can be configured and deployed to suit your needs of Digital Asset Managment (DAM).  The long-term goal is to incorporate techniques such as Auto Curation to any/all files, leveraging an ever-growing intelligent taxonomy, a taxonomy built on user-defined labels/tags, as well an AI rules engine with ML techniques.   OneDrive, as a cloud storage platform, may bridge the gap between JUST cloud storage and a DAM.

Ingestion and Curation Workflow

Content Creation Apps and Auto Curation

  • The ability for Content Creation applications, such as Microsoft Word, to capture not only the user-defined tags but also the context of the tags relating to the content.
    • When ingesting a Microsoft PowerPoint presentation, after consuming the file, and Auto Curation process can extract “reusable components” of the file, such as slide header/name, and the correlated content such as a table, chart, or graphics.
    • Ingesting Microsoft Excel and Auto Curation of Workbooks may yield “reusable components” stored as metadata tags, and their correlated content, such as chart and table names.
    • Ingesting and Auto Curation of Microsoft Word documents may build a classic Index for all the most frequently occurring words, and augment the manually user-defined tags in the file.
    • Ingestion of Photos [and Videos] into and Intelligent Cloud Storage Platform, during the Auto Curation process, may identify commonly identifiable objects, such as trees or people.  These objects would be automatically tagged through the Auto Curation process after Ingestion.
  • Ability to extract the content file metadata, objects and text tags, to be stored in a standard format to be extracted by DAMs, or Intelligent Cloud Storage Platforms with file and metadata search capabilities.  Could OneDrive be that intelligent platform?
  • A user can search for a file title or throughout the Manual and Auto Curated, defined metadata associated with the file.  The DAM or Intelligent Cloud Storage Platform provides both search results.   “Reusable components” of files are also searchable. 
    • For “Reusable Components” to be parsed out of the files to be separate entities, a process needs to occur after Ingestion Auto Curration.
  • Content Creation application, user-entry tag/text fields should have “drop-down” access to the search index populated with auto/manual created tags.

Auto Curation and Intelligent Cloud Storage

  • The intelligence of Auto Curation should be built into the Cloud Storage Platform, e.g. potentially OneDrive.
  • At a minimum, auto curation should update the cloud storage platform indexing engine to correlate files and metadata.
  • Auto Curation is the ‘secret sauce’ that “digests” the content to build the search engine index, which contains identified objects (e.g. tag and text or coordinates)  automatically
    • Auto Curation may leverage a rules engine (AI) and apply user configurable rules such as “keyword density” thresholds
    • Artificial Intelligence, Machine Learning rules may be applied to the content to derive additional labels/tags.
  • If leveraging version control of the intelligent cloud storage platform, each iteration should “re-index” the content, and update the Auto Curation metadata tags.  User-created tags are untouched.
  • If no user-defined labels/tags exist, upon ingestion, the user may be prompted for tags

Auto Curation and “3rd Party” Sources

In the context of sources such as a Twitter feed, there exists no incorporation of feeds into an Intelligent Cloud Storage.  OneDrive, Cloud Intelligent Storage may import feeds from 3rd party sources, and each Tweet would be defined as an object which is searchable along with its metadata (e.g. likes; tags).

Operating System, Intelligent Cloud Storage/DAM

The Intelligent Cloud Storage and DAM solutions should have integrated search capabilities, so on the OS (mobile or desktop) level, the discovery of content through the OS search of tagged metadata is possible.

Current State

  1. OneDrive has no ability to search Microsoft Word tags
  2. The UI for all Productivity Tools must have a comprehensive and simple design for leveraging an existing taxonomy for manual tagging, and the ability to add hints for auto curation
    1. Currently, Microsoft Word has two fields to collect metadata about the file.  It’s obscurely found at the “Save As” dialog.
      1. The “Save As” dialogue box allows a user to add tags and authors but only when using the MS Word desktop version.  The Online (Cloud) version of Word has no such option when saving to Microsoft OneDrive Cloud Storage
  3. Auto Curation (Artificial Intelligence, AI) must inspect the MS Productivity suite tools, and extract tags automatically which does not exist today.
  4. No manual taging or Auto Curation/Facial Recognition exists.

Microsoft Flow – Platform Review

It looks like Microsoft created a generic workflow platform, product independent.

Microsoft has software solutions, like MS Outlook with an [email] rules engine built into Outlook.  SharePoint has a workflow solution within the Sharepoint Platform, typically governing the content flowing through it’s system.

Microsoft Flow is a different animal.  It seems like Microsoft has built a ‘generic’ rules engine for processing almost any event.  The Flow product:

  1. Start using the product from one of two areas:  a) “My Flows” where I may view existing and create new [work]flows. b) “Activity”, that shows “Notifications” and “Failures”
  2. Select “My Flows”, and the user may “Create [a workflow] from Blank”,  or “Browse Templates”.  MSFT existing set of templates were created by Microsoft, and also by a 3rd party implying a marketplace.
  3. Select “Create from Blank” and the user has a single drop down list of events, a culmination events across Internet products. There is an implication there could be any product, and event “made compatible” with MSFT Flows.
    1. The drop down list of events has a format of “Product – Event”.  As the list of products and events grow, we should see at least two separate drop down lists, one for products, and a sub list for the product specific events.
    2. Several Example Events Include:
      1. “Dropbox – When a file is created”
      2. “Facebook – When there is a new post to my timeline”
      3. “Project Online – When a new task is created”
      4. “RSS – When a feed item is published”
      5. “Salesforce – When an object is created”
    3. The list of products as well as there events may need a business analyst to rationalize the use cases.
  4. Once an Event is selected, event specific details may be required, e.g. Twitter account details, or OneDrive “watch” folder
  5. Next, a Condition may be added to this [work]flow,  and may be specific to the Event type, e.g. OneDrive File Type properties [contains] XYZ value.  There is also an “advanced mode” using a conditional scripting language.
  6. There is “IF YES” and “IF NO” logic, which then allows the user to select one [or more] actions to perform
    1. Several Action Examples Include:
      1. “Excel – Insert Rows”
      2. “FTP – Create File”
      3. “Google Drive – List files in folder”
      4. “Mail – Send email”
      5. “Push Notification – Send a push notification”
    2. Again, it seems like an eclectic bunch of Products, Actions, and Events strung together to have a system to POC.
  7. The Templates list, predefined set of workflows that may be of interest to anyone who does not want to start from scratch.   The UI provides several ways to filter, list, and search through templates.

Applicable to everyday life, from an individual home user, small business, to the enterprise.  At this stage the product seems in Beta at best, or more accurately, just after clickable prototype.  I ran into several errors trying to go through basic use cases, i.e. adding rules.

Despite the “Preview” launch, Microsoft has showed us the power in [work]flow processing regardless of the service platform provider, e.g.  Box, DropBox, Facebook, GitHub, Instagram, Salesforce, Twitter, Google, MailChimp, …

Microsoft may be the glue to combine service providers who may / expose their services to MSFT Flow functionality.

Create from Blank - Select Condition
Create from Blank – Select Condition

 

Create Rule from Template
Create Rule from Template
Create from Blank Rule Building UI
Create from Blank Rule Building UI

 

Update June 28th, 2016:

Opportunities for Event, Condition, Action Rules

  • Transcoding [cloud] Services
  • [IBM Watson] Cognitive APIs
    • e.g. Language:Translation; E.g.2. Visual Recognition;
  • WordPress – Create a Post
    • New text file dropped in specific folder on Box, DropBox, etc. being ‘monitored’ by MSFT flow [?] Additional code required by user for ‘polling’ capabilities
    • OR new text file attached, and emailed to specific email account folder ‘watched’ by MSFT Flow.
    • Event triggers – Automatic read of new text file
      • stylizing may occur if HTML coding used
    • Action – Post to a Blog
  • ‘ANY’ Event occurs, a custom message is sent using Skype for a single or group of Skype accounts;
    • On several ‘eligible’ events, such as “File Creation” into Box,  the file (or file shared URL) may be sent to the Skype account.
  • ‘ANY’ Event occurs, a custom mobile text message is sent to a single or group of phone numbers.
  • Event occurs for “File Creation” e.g. into Box; after passing a “Condition”, actions occur:
    • IBM Watson Cognitive API, Text to Speech, occurs, and the product of the action is placed in the same Box folder.
  • Action: Using Microsoft Edge (powered by MSN), in the “My news feed” tab, enable action to publish “Cards”, such as app notifications

Challenges \ Opportunities \ Unknowns

  • 3rd party companies existing, published [cloud; web service] APIs may not even need any modification to integrate with Microsoft Flow; however, business approval may be required to use the API in this manner,
  • It is unclear re: Flow Templates need to be created by the product owner, e.g. Telestream, or knowledgeable third party, following the Android, iOS, and/or MSFT Mobile Apps model.
  • It is unclear if the MSFT Flow app may be licensed individually in the cloud, within the 365 cloud suite, or offered for Home and\or Business?

As a Data Deluge Grows, Companies Rethink Storage

At Pure Storage, a device introduced on Monday holds five times as much data as a conventional unit.

  • IBM estimates that by 2020 we will have 44 zettabytes — the thousandfold number next up from exabytes — generated by all those devices. It is so much information that Big Blue is staking its future on so-called machine learning and artificial intelligence, two kinds of pattern-finding software built to cope with all that information.
  • Pure Storage chief executive, Scott Dietzen, “No one can look at all their data anymore; they need algorithms just to decide what to look at,”

Source: As a Data Deluge Grows, Companies Rethink Storage – The New York Times

Additional Editorial:

Pure Storage is looking to “compress” the amount of data that can be stored in a Storage Array using Flash Memory, “Flashblade”.   They are also tuning the capabilities of the solution for higher I/O throughput, and optimized, addressable storage.

Several companies with large and growing storage footprints have already begin to customize their storage solutions to accommodate the void in this space.

Building more storage arrays is a temporary measure while the masses of people, or fleets of cars turn on their IoT enabled devices.

Data is flooding the Internet, and innumerable, duplicate ‘objects’  of information, requiring redundant storage, are prevalent conditions. A registry, or public ‘records’ may be maintained.   Based on security measures, and the public’s appetite determine what “information objects” may be centrally located.  As intermediaries, registrars may build open source repositories, as an example, using Google Drive, or Microsoft Azure based on the data types of ‘Information Objects”

  • Information object registrars may contain all different types of objects, which indicate where data resides on the Internet.
    • vaguely similar to Domain name registrar hierarchy
    • another example, Domain Name System (DNS) is the best example of the registration process I am suggesting to clone and leverage for all types of data ranging from entertainment to medical records.
  • Medical “Records”, or Medical “Information Objects”
    • X-ray images, everything from dental to medical, and correlating to other medical information object(s),
  • Official ‘Education’ records from K-12 and beyond, e.g. degrees and certifications achieved;
  • Secure, easy access to ‘public’ ‘information objects’ by the owner, and creator.  Central portal(s) driving user traffic.  Enables ‘owner’ of records to take ‘ownership’ of their health, for example

Note: there are already ‘open’ platforms being developed and used for several industries including medical; with limed access.  However, the changes I’m proposing imposes a ‘registrar’ process whereby portals of information are registered, and are interwoven, linking to one another.

It’s an issue of excess weight upon the “Internet”, and not just the ‘weight’ of unnecessary storage, the throughput within a weaved set of networks as well.

Think of it in terms of opportunity cost.  First quantify what an ‘information object’, or ‘block of data’ equates to in cost.  It seems there must already be a measurement in existence, a medium amount to charge / cost per “information object”.  Finally, for each information object type, e.g. song, movie, news story, technical specifications, etc. identify how many times this exact object is perpetuated in the Internet.

Steps on reducing  data waste:

  • Without exception, each ‘information object’ contains an (XML) meta data file.
  • Each of the attributes describing information objects are built out as these assets are being used; e.g. proactive autopopulate search, and using an AI Induction engine
  • X out of Y metadata type and values are equivalent
    • the more attributes correlate to one or more objects, the more likely these objects are
      • related on some level, e.g. sibling, cousin
      • or identical objects, and may need meta relationship update
    • the metadata encapsulates the ‘information object’

Another opportunity to organize “Information Asset Objects” would be to leverage the existing DNS platform for managing “Information Asset Repositories”.   This additional Internet DNS structure would enable queries across information asset repositories.   Please see “So Much Streaming Music, Just Not in One Place”  for more details.