Excellent presentations from the Hortonworks team for “NiFi on HDF” solutions architecture and best practices. Powerful solution to process and distribute data in real-time, any data, and in large quantities with resiliency. It’s no wonder why the US NSA originally developed the ability to consume data in real-time, manipulate it, and then send it on it’s way. However, recognizing the commercial applications (benevolent wisdom?), the NSA released the product as open-source software, via its technology transfer program.
As a tangent, among other things, I’m currently exploring the capabilities of “Microsoft Flow“, which has recently been promoted to GA from their ‘Preview Release’. One resonating question came to mind during the presentations last night:
At it’s peak maturity (not yet), can Microsoft Flow successfully compete with Apache NiFi on Hortonworks HDF?
The NiFi / HDF solution manages data flows in real-time. The Microsoft Flow architecture seems to fall short in this capacity. Is it on the product road map for Flow? Is it a capability Microsoft wants to have?
There a bit of architecture / infrastructure on the Hortonworks HDF side, which enables the solution as a whole to be able to ingest, process, and push the data in real-time. Not sure Microsoft Flow is currently engineered on the back end to handle the throughput.
The current Microsoft Flow UI may need to be updated to handle this ‘slightly altered’ paradigm of real-time content consumption and distribution.
The comparison between Microsoft Flow and NiFi on HDF may be a huge stretch for comparison.
Serverless computing is a cloud computing code execution model in which the cloud provider fully manages starting and stopping virtual machines as necessary to serve requests, and requests are billed by an abstract measure of the resources required to satisfy the request, rather than per virtual machine, per hour. Despite the name, it does not actually involve running code without servers. Serverless computing is so named because the business or person that owns the system does not have to purchase, rent or provision servers or virtual machines for the back-end code to run .
Based on your application Use Case(s), Cloud Serverless Computing architecture may reduce ongoing costs for application usage, and provide scalability on demand without the Cloud Server Instance management overhead, i.e. costs and effort.
Note: Cloud Serverless Computing is used interchangeability with Functions as a service (FaaS) which makes sense from a developer’s standpoint as they are coding Functions (or Methods), and that’s the level of abstraction.
Create automated workflows between apps and services to get notifications, synchronize files, collect data, and more. Although not the traditional Serverless Computing implementation, it’s the quickest way to perform application services without having to procure the application servers. Depending on your microservices (connectors + templates) definitions, you may not need to write a single line of code, and could all be done through the Flow console.
Connectors are “enablers” to connect to [data] sources in order to extract or insert data, typically one Connector per service, such as Twitter.
Templates utilize Connectors, and enable workflow designers to build business process workflows. Execution of the manufactured workflows performs the activities either Event trigger driven, or ADHOC / manual execution through the portal or through the Microsoft Flow mobile apps.
154 Service Connectors Exist. Several “Premium” connectors require monthly nominal fee (5 USD). For example, using the Oracle Database Connecter empowers the workflow designer insert, update, select, and delete rows in a table.
Automating business processes by designing workflows to turn repetitive tasks into multi-step workflows
Microsoft Flow Pricing
As listed below, there are three tiers, which includes a free tier for personal use or exploring the platform for your business. The pay Flow plans seem ridiculously inexpensive based on what business workflow designers receive for the 5 USD or 15 USD per month. Microsoft Flow has abstracted building workflows so almost anyone can build application workflows or automate business manual workflows leveraging almost any of the popular applications on the market.
It doesn’t seem like 3rd party [data] Connectors and Template creators receive any direct monetary value from the Microsoft Flow platform. Although workflow designers and business owners may be swayed to purchase 3rd party product licenses for the use of their core technology.
Properly designed microservices have a single responsibility and can independently scale. With traditional applications being broken up into 100s of microservices, traditional platform technologies can lead to significant increase in management and infrastructure costs. Google Cloud Platform’s serverless products mitigates these challenges and help you create cost-effective microservices.
AWS provides a set of fully managed services that you can use to build and run serverless applications. You use these services to build serverless applications that don’t require provisioning, maintaining, and administering servers for backend components such as compute, databases, storage, stream processing, message queueing, and more. You also no longer need to worry about ensuring application fault tolerance and availability. Instead, AWS handles all of these capabilities for you, allowing you to focus on product innovation and get faster time-to-market. It’s important to note that Amazon was the first contender in this space with a 2014 product launch.
Execute code on demand in a highly scalable serverless environment. Create and run event-driven apps that scale on demand.
Focus on essential event-driven logic, not on maintaining servers
Integrate with a catalog of services
Pay for actual usage rather than projected peaks
The OpenWhisk serverless architecture accelerates development as a set of small, distinct, and independent actions. By abstracting away infrastructure, OpenWhisk frees members of small teams to rapidly work on different pieces of code simultaneously, keeping the overall focus on creating user experiences customers want.
Serverless Computing is a decision that needs to be made based on the usage profile of your application. For the right use case, serverless computing is an excellent choice that is ready for prime time and can provide significant cost savings.
Protecting the Data Warehouse with Artificial Intelligence
Teleran is a middleware company who’s software monitors and governs OLAP activity between the Data Warehouse and Business Intelligence tools, like Business Objects and Cognos. Teleran’s suite of tools encompass a comprehensive analytical and monitoring solution called iSight. In addition, Teleran has a product that leverages artificial intelligence and machine learning to impose real-time query and data access controls. Architecture also allows for Teleran’s agent not to be on the same host as the database, for additional security and prevention of utilizing resources from the database host.
Key Features of iGuard:
Policy engine prevents “bad” queries before reaching database
Patented rule engine resides in-memory to evaluate queries at database protocol layer on TCP/IP network
Patented rule engine prevents inappropriate or long-running queries from reaching the data
70 Customizable Policy Templates
SQL Query Policies
Create policies using policy templates based on SQL Syntax:
Require JOIN to Security Table
Column Combination Restriction – Ex. Prevents combining customer name and social security #
Table JOIN restriction – Ex. Prevents joining two different tables in same query
Equi-literal Compare requirement – Tightly Constrains Query Ex. Prevents hunting for sensitive data by requiring ‘=‘ condition
By user or user groups and time of day (shift) (e.g. ETL)
Blocks connections to the database
White list or black list by
DB User Logins
OS User Logins
Applications (BI, Query Apps)
Rule Templates Contain Customizable Messages
Each of the “Policy Templates” has the ability to send the user querying the database a customized message based on the defined policy. The message back to the user from Teleran should be seamless to the application user’s experience.
Machine Learning: Curbing Inappropriate, or Long Running Queries
iGuard has the ability to analyze all of the historical SQL passed through to the Data Warehouse, and suggest new, customized policies to cancel queries with certain SQL characteristics. The Teleran administrator sets parameters such as rows or bytes returned, and then runs the induction process. New rules will be suggested which exceed these defined parameters. The induction engine is “smart” enough to look at the repository of queries holistically and not make determinations based on a single query.