Tag Archives: ETL

Apache NiFi on Hortonworks HDF Verses … Microsoft Flow?

Attended a technical discussion last night on Apache NiFi and Hortonworks HDF,  a Meetup @ Honeywell, a Hortonworks client.

Excellent presentations from the Hortonworks team for “NiFi on HDF” solutions architecture and best practices. Powerful solution to process and distribute data in real-time, any data, and in large quantities with resiliency.   It’s no wonder why the US NSA originally developed the ability to consume data in real-time, manipulate it, and then send it on it’s way.  However, recognizing the commercial applications (benevolent wisdom?), the NSA released the product as open-source software, via its technology transfer program.

As a tangent,  among other things, I’m currently exploring the capabilities of “Microsoft Flow“, which has recently been promoted to GA from their ‘Preview Release’.  One resonating question came to mind during the presentations last night:

At it’s peak maturity (not yet), can Microsoft Flow successfully compete with Apache NiFi on Hortonworks HDF?

Discussion Points:

  • The NiFi / HDF solution manages data flows in real-time.  The Microsoft Flow architecture seems to fall short in this capacity. Is it on the product road map for Flow?  Is it a capability Microsoft wants to have?
  • There a bit of architecture / infrastructure on the Hortonworks HDF side, which enables the solution as a whole to be able to ingest, process, and push the data in real-time.   Not sure Microsoft Flow is currently engineered on the back end to handle the throughput.
  • The current Microsoft Flow UI may need to be updated to handle this ‘slightly altered’ paradigm of real-time content consumption and distribution.

The comparison between Microsoft Flow and NiFi on HDF may be a huge stretch for comparison.

WordPress Shortcode API to Cloud Storage to Sell Any Digital Intellectual Property.

So, I was a browsing, going through bills, and thinking, hey relating to my other article on Google Docs and their new API where you could use them as a data warehouse, it occurred to me.   Why can’t we have a public API for all the Cloud Storage systems like Amazon Web Services (AWS) S3 (or Box.com), create a plugin to WordPress, add E-Commerce, and you now have your own place to sell digital music, or any Digital intellectual, property store, or host your own database OLTP or OLAP.

And my bro, Fat Panda, might have been thinking the same thing.  He’s one step behind, but he will catch on.  I will try to update for ‘the cheap seats’ in a bit.

For the cheap seats, even those static files stored up in the cloud, you can use a similar model to Google Docs <-> Google Fusion where you add tabular data to storage, read,over-write, or update using home made table locking mechanism, and essentially use the cloud as a data warehouse, or even a database.  Microsoft seems to have a lead on transitional and analytical storage with Microsoft Azure, relational in nature in the cloud, but it is so much simpler than that with cloud storage, although if not implemented with ‘row’ locking,there is an issue with OLTP (On Line Transaction Processing) row level, high volume, but with OLAP, On Line Analytic Processing, not so much, analyzing the way your business does business, and profit more from your consumer data.  There are easy ways to implement row level locking for row level locking of tabular data stored in cloud storage like AWS or Box.Net,  The methods to implement row level locking for OLTP systems using storage in the cloud are easy to implement, and will remind you of old school type alternatives to supplement the AutoNumber columns in MS Access or Identity columns in SQL Server. At the end of the day to either sell digital intellectual property from a WordPress implementation, or run your entire business with a robust cloud database solution for OLTP or OLAP systems using flat file storage!  Why go through all this when the Amazons AWS and Microsoft Azure have or will yearn to start building these solutions in parallel?  Cost effective solutions, and the entire database arena monopolized by Oracle, IBM, Microsoft, and MySQL, just got extended to a whole lot of database vendors.  It may take a while, but we already know the big Gorilla in the room Google is the first to strike in this game, as a non-traditional database vendor, cloud storage provider with their updated Google Docs API, and optionally usage of their Fusion application.