Tag Archives: Box.com

Cloud Storage and DAM Solutions: Don’t Reign in the Beast

Are you trying to apply metadata on individual files or en masse, attempting to make the vast  growth of cloud storage usage manageable, meaningful storage?

Best practices leverage a consistent hierarchy, an Information Architecture in which to store and retrieve information, excellent.

Beyond that, capabilities computer science has documented and used time and time again, checksum algorithms. Used frequently after a file transfer to verify the file you requested is the file you received.  Most / All Enterprise DAM solutions use some type of technology to ‘allow’ the enforcement of unique assets [upon upload].  In cloud storage and photo solutions targeted toward the individual, consumer side, the feature does not appear to be up ‘close and personal’ to the user experience, thus building a huge expanse of duplicate data (documents, photos, music, etc.).  Another feature, a database [primary] key has been used for decades to identify that a record of data is unique.

Our family sharing alone has thousands and thousands of photos and music. The names of the files could be different for many of the same digital assets.  Sometimes file names are the same, but the metadata between the same files is not unique, but provides value. Tools for ‘merging’ metadata, DAM tools have value to help manage digital assets.

Cloud storage usage is growing exponentially, and metadata alone won’t help rope in the beast. Maybe ADHOC or periodic indexing of files [e.g. by #checksum algorithm] could take on the task of identifying duplicate assets?  Duplicate  assets could be viewed by the user in an exception report?  Less boring, upon upload, ‘on the fly’ let the user know the asset is already in storage, and show a two column diff. of the metadata.

It’s a pain for me, and quite possibly many cloud storage users.  As more people jump on cloud storage, this feature should be front and center to help users grow into their new virtual warehouse.

The industry of cloud storage most likely believes for the common consumer, storage is ‘cheap’, just provide more.  At some stage, the cloud providers may look to DAM tools as the cost of managing a users’ storage rises.  Tools like:

  • duplicate digital assets, files. Use exception reporting to identify the duplicates, and enable [bulk] corrective action, and/or upon upload, duplicate ‘error/warning’ message.
  • Dynamic metadata tagging upon [bulk] upload using object recognition.  Correlating and cataloging one or more [type] objects in a picture using defined Information Architecture.  In addition, leveraging facial recognition for updates to metadata tagging.
    • e.g. “beach” objects: sand, ocean; [Ian Roseman] surfing;
  • Brief questionnaires may enable the user to ‘smartly’ ingest the digital assets; e.g. ‘themes’ of current upload; e.g. a family, or relationship tree to  extend facial recognition correlations.
    • e.g. themes – summer; party; New Year’s Eve
    • e.g. relationship tree – office / work
  • Pan Information Architecture (IA) spanning multiple cloud storage [silos]. e.g. for Photos, spanning [shared] ‘albums’
  • Publically published / shared components of an IA;  e.g. Legal documents;  standards and reuse

WordPress Shortcode API to Cloud Storage to Sell Any Digital Intellectual Property.

So, I was a browsing, going through bills, and thinking, hey relating to my other article on Google Docs and their new API where you could use them as a data warehouse, it occurred to me.   Why can’t we have a public API for all the Cloud Storage systems like Amazon Web Services (AWS) S3 (or Box.com), create a plugin to WordPress, add E-Commerce, and you now have your own place to sell digital music, or any Digital intellectual, property store, or host your own database OLTP or OLAP.

And my bro, Fat Panda, might have been thinking the same thing.  He’s one step behind, but he will catch on.  I will try to update for ‘the cheap seats’ in a bit.

For the cheap seats, even those static files stored up in the cloud, you can use a similar model to Google Docs <-> Google Fusion where you add tabular data to storage, read,over-write, or update using home made table locking mechanism, and essentially use the cloud as a data warehouse, or even a database.  Microsoft seems to have a lead on transitional and analytical storage with Microsoft Azure, relational in nature in the cloud, but it is so much simpler than that with cloud storage, although if not implemented with ‘row’ locking,there is an issue with OLTP (On Line Transaction Processing) row level, high volume, but with OLAP, On Line Analytic Processing, not so much, analyzing the way your business does business, and profit more from your consumer data.  There are easy ways to implement row level locking for row level locking of tabular data stored in cloud storage like AWS or Box.Net,  The methods to implement row level locking for OLTP systems using storage in the cloud are easy to implement, and will remind you of old school type alternatives to supplement the AutoNumber columns in MS Access or Identity columns in SQL Server. At the end of the day to either sell digital intellectual property from a WordPress implementation, or run your entire business with a robust cloud database solution for OLTP or OLAP systems using flat file storage!  Why go through all this when the Amazons AWS and Microsoft Azure have or will yearn to start building these solutions in parallel?  Cost effective solutions, and the entire database arena monopolized by Oracle, IBM, Microsoft, and MySQL, just got extended to a whole lot of database vendors.  It may take a while, but we already know the big Gorilla in the room Google is the first to strike in this game, as a non-traditional database vendor, cloud storage provider with their updated Google Docs API, and optionally usage of their Fusion application.

Tablet Developers Make Business Intelligence Tools using Google as a Data Warehouse: Completing with Oracle, IBM, and Microsoft SQL Server

And, he shoots, and scores.  I called it, sort of.  Google came out of the closet today as a data warehouse vendor, at least they need a community of developers to connect the dots to help build an amazing Business Intelligence suite.

Google came out with a Google Docs API today, which using languages from Objective-C (iOS), C#, to Java so you can use Google as your Data Warehouse for any size business. All you need to do is write an ETL program which uploads and downloads tables from your local database to Google Docs, and you create your own Business Intelligence User Interface for the creation and viewing of Charts & Graphs.  It looks like they’ve changed strategies, or this was the plan all along.

Initially I thought that Google Fusion was going to be the table editing tool to manipulate your data that was transferred from your transactional database using the Google Docs API.  Today they released a Google Docs API and developers can create their own ETL drivers and a Business Intelligence User Interface that can run on any platform from an Android Tablet, iPad, or Windows Tablet.

A few days ago, I wrote the article, which looked like they were going to use a tool called Google Fusion, which was in Beta at the time to manipulate tabular data, and eventually extend it to create common BI components, such as graphs, charts, edit tables, etc.

A few gotchas: Google Docs on Apple iPad is version 1.1.1 released 9/28/12, so we are talking very early days, and the Google Docs API was released today.   I would imagine since you can also use C#, someone can make a Windows application on the desktop to manipulate the data tables, create and view graphs, so a Windows Tablet can be used.  The API also has Java compatibility, so from any Unix box, or any platform, Java is write once, run anywhere, wherever your transitional database lives, a developer is able to write a driver to transfer the data to Google Docs dynamically, and then use Google Docs API for Business Intelligence.  You can even write an ETL driver which all it does is rapidly transfer data, like an ODBC, or JDBC driver and use any business intelligence tools you have on your desktop, or a nightly ETL.  However, I can see developers creating business intelligence tools on Android, iPad, or Windows tables to modify tables, create and view charts, etc., using custom BI tool sets and their data warehouse now becomes Google Docs.

Please reference an article I wrote a few days back, “Google is Going to be the Next Public and Private Data Warehouse“.

At that time, Google Fusion was marked as Beta on 10/13/2012.  Google has since stripped off the word Beta, but doesn’t matter.  Its even better with the Google API to Google Docs.  Google Fusion could be your starter User Interface, however, if your Android, iOS (Apple iPad), and Windows developers really embrace this API, all of the big database companies like IBM, Oracle, and Microsoft may have their market share eroded to some extent, if not a great extent.

Update 10/19:

Hey Gs (Guys and Gals), I forgot to mention, you can also make your own video or music streaming applications perhaps, using the basic calls of get and receive file other companies are already doing such as AWS, Box, etc. It’s a simple get / send API, so not sure if it’s applicable to ‘streaming’ at this stage, just another storage location in the ‘cloud’, which would be quite boring.  Although thinking of it now, aren’t all the put / send cloud solutions potential data warehouses using ETL and the APIs discussed and published above?  Also, it’s ironic that Google would also be competing with itself, if it was a file share, ‘stream’ videos, and YouTube?