An interesting approach to Data Loss Prevention (DLP)
Data loss prevention (DLP) is one of the most important tools that enterprises have to protect themselves from modern security threats like data exfiltration, data leakage, and other types of sensitive data and secrets exposure. Many organizations seem to understand this, with the DLP market expected to grow worldwide in the coming years. However, not all approaches to DLP are created equal. DLP solutions can vary in the scope of remediation options they provide as well as the security layers that they apply to. Traditionally, data loss prevention has been an on-premise or endpoint solution meant to enforce policies on devices connected over specific networks. As cloud adoption accelerates, though, the utility of these traditional approaches to DLP will substantially decrease.
Established data loss prevention solution providers have attempted to address these gaps with developments like endpoint DLP and cloud access security brokers (CASBs) which provide security teams with visibility of devices and programs running outside of their walls or sanctioned environments. While both solutions minimize security blind spots, at least relative to network layer and on-prem solutions, they can result in inconsistent enforcement. Endpoint DLPs, for example, do not provide visibility at the application layer, meaning that policy enforcement is limited to managing what programs and data are installed on a device. CASBs can be somewhat more sophisticated in determining what cloud applications are permissible on a device or network, but may still face similar shortfalls surrounding behavior and data within cloud applications.
Cloud adoption was expected to grow nearly 17% between 2019 and 2020; however, as more enterprises embrace cloud-first strategies for workforce management and business continuity during the COVID-19 pandemic, we’re likely to see even more aggressive cloud adoption. With more data in the cloud, the need for policy remediation and data visibility at the application layer will only increase and organizations will begin to seek cloud-native approaches to cloud security.
Several of the relational database software vendors, such as IBM, Oracle, and Teradata have developed proprietary data warehouse software to be tightly coupled with server hardware to maximize performance. These solutions have been developed and refined as “on-prem” solutions for many years.
We’ve seen the rise of “Database (DW) as a Service” from companies like Amazon, who sell Redshift services.
Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds.
RDB Complex Software/Hardware Maintenance
In recent times, the traditional relational database software vendors shifted gears to become service providers offering maximum performance from a solution hosted by them, the vendor, in the Cloud. On the positive side, the added complexity of configuring and tuning a blended software/hardware data warehouse has been shifted from the client’s team resources such as Database Administrators (DBAs), Network Administrators, Unix/Windows Server Admins,… to the database software service provider. The complexity of tuning for scalability, and other maintenance challenges shifts to the software vendor’s expertise, if that’s the abstraction you select. There is some ambiguity in the delineation of responsibilities with the RDBMS vendor’s cloud offerings.
Total Cost of Ownership
Quantifying the total cost of ownership of a solution may be a bit tricky, especially if you’re trying to quantify the RDBMS hybrid software/hardware “on-prem” solution versus the same or similar capabilities brought to the client via “Database (DW) as a Service”.
“On-Prem”, RDB Client Hosted Solution
Several factors need to be considered when selecting ANY software and/or Hardware to be hosted at the client site.
Infrastructure “when in Rome”
Organizations have a quantifiable cost related to hosting physical or virtual servers in the client’s data center and may be boiled down to a number that may include things like HVAC, or new rack space.
Resources used to maintain/monitor DC usage, there may be an abstracted/blended figure.
Database Administrators maintain and monitor RDB solutions.
Activities may range from RDB patches/upgrades to resizing/scaling the DB storage “containers”.
Application Database Admins/Developers may be required to maintain the data warehouse architecture, such as new requirements, e.g. creating aggregate tables for BI analysis.
Network Administrators
Firewalls, VPN
Port Scanning
Windows/Unix Server Administrators
Antivirus
OS Patches
Trying to correlate these costs in some type of “Apples to Apples” comparison to the “Data Warehouse as a Service” may require accountants and technical folks to do extensive financial modeling to make the comparison. Vendors, such as Oracle, offer fully managed services to the opposite end of the spectrum, the “Bare Metal”, essentially the “Infra as a Service.” The Oracle Exadata solution can be a significant investment depending on the investment in redundancy and scalability leveraging Oracle Real Application Clusters (RAC).
Support and Staffing Models for DW Cloud Vendors
In order for the traditional RDB software vendors to accommodate a “Data Warehouse as a Service” model, they may need to significantly increase staff for a variety of technical disciplines, as outlined above with the Client “On-Prem” model. A significant ramp-up of staff and the organizational challenges of developing and implementing a support model based on a variety of factors may have relational database vendors ask: Should they leverage a top tier consulting agency such as Accenture, or Deloitte to define, implement, and refine a managed service? It’s certainly a tall order to go from a software vendor to offering large scale services. With corporate footprints globally and positive track records implementing managed services of all types, it’s an attractive proposition for both the RDB vendor and the consulting agency who wins the bid. Looking at the DW Service billing models don’t seem sensical on some level. Any consulting agency who implements a DW managed service would be responsible to ensure ROI both for the RDS vendor and their clients. It may be opaque to the end client leveraging the Data Warehouse as a Service, but certainly, the quality of service provided should be nothing less than if implemented by the RDB vendor itself. If the end game for the RDB vendor is for the consulting agency to implement, and mature the service then at some point bring the service in-house, it could help to keep costs down while maturing the managed service.
Oracle Exadata
Here are URLs for reference to understand the capabilities that are realized through Oracle’s managed services.
The AI personal assistant with the “most usage” spanning connectivity across all smart devices, will be the anchor upon which users will gravitate to control their ‘automated’ lives. An Amazon commercial just aired which depicted a dad with his daughter, and the daughter was crying about her boyfriend who happened to be in the front yard yelling for her. The dad says to Amazon’s Alexa, sprinklers on, and yes, the boyfriend got soaked.
What is so special about top spot for the AI Personal Assistant? Controlling the ‘funnel’ upon which all information is accessed, and actions are taken means the intelligent ability to:
Serve up content / information, which could then be mixed in with advertisements, or ‘intelligent suggestions’ based on historical data, i.e. machine learning.
Proactive, suggestive actions may lead to sales of goods and services. e.g. AI Personal Assistant flags potential ‘buys’ from eBay based on user profiles.
Three main sources of AI Personal Assistant value add:
A portal to the “outside” world; E.g. If I need information, I wouldn’t “surf the web” I would ask Cortana to go “Research” XYZ; in the Business Intelligence / data warehousing space, a business analyst may need to run a few queries in order to get the information they wanted. In the same token, Microsoft Cortana may come back to you several times to ask “for your guidance”
An abstraction layer between the user and their apps; The user need not ‘lift a finger’ to any app outside the Personal Assistant with noted exceptions like playing a game for you.
User Profiles derived from the first two points; I.e. data collection on everything from spending habits, or other day to day rituals.
Proactive and chatty assistants may win the “Assistant of Choice” on all platforms. Being proactive means collecting data more often then when it’s just you asking questions ADHOC. Proactive AI Personal Assistants that are Geo Aware may may make “timely appropriate interruptions”(notifications) that may be based on time and location. E.g. “Don’t forget milk” says Siri, as your passing the grocery store. Around the time I leave work Google maps tells me if I have traffic and my ETA.
It’s possible for the [non-native] AI Personal Assistant to become the ‘abstract’ layer on top of ANY mobile OS (iOS, Android), and is the funnel by which all actions / requests are triggered.
Microsoft Corona has an iOS app and widget, which is wrapped around the OS. Tighter integration may be possible but not allowed by the iOS, the iPhone, and the Apple Co. Note: Google’s Allo does not provide an iOS widget at the time of this writing.
Antitrust violation by mobile smartphone maker Apple: iOS must allow for the ‘substitution’ of a competitive AI Personal Assistant to be triggered in the same manner as the native Siri, “press and hold home button” capability that launches the default packaged iOS assistant Siri.
Reminiscent of the Microsoft IE Browser / OS antitrust violations in the past.
Holding the iPhone Home button brings up Siri. There should be an OS setting to swap out which Assistant is to be used with the mobile OS as the default. Today, the iPhone / iPad iOS only supports “Siri” under the Settings menu.
ANY AI Personal assistant should be allowed to replace the default OS Personal assistant from Amazon’s Alexa, Microsoft’s Cortana to any startup company with expertise and resources needed to build, and deploy a Personal Assistant solution. Has Apple has taken steps to tightly couple Siri with it’s iOS?
AI Personal Assistant ‘Wish” list:
Interactive, Voice Menu Driven Dialog; The AI Personal Assistant should know what installed [mobile] apps exist, as well as their actionable, hierarchical taxonomy of feature / functions. The Assistant should, for example, ask which application the user wants to use, and if not known by the user, the assistant should verbally / visually list the apps. After the user selects the app, the Assistant should then provide a list of function choices for that application; e.g. “Press 1 for “Play Song”
The interactive voice menu should also provide a level of abstraction when available, e.g. User need not select the app, and just say “Create Reminder”. There may be several applications on the Smartphone that do the same thing, such as Note Taking and Reminders. In the OS Settings, under the soon to be NEW menu ‘ AI Personal Assistant’, a list of installed system applications compatible with this “AI Personal Assistant” service layer should be listed, and should be grouped by sets of categories defined by the Mobile OS.
Capability to interact with IoT using user defined workflows. Hardware and software may exist in the Cloud.
Ever tighter integration with native as well as 3rd party apps, e.g. Google Allo and Google Keep.
Apple could already be making the changes as a natural course of their product evolution. Even if the ‘big boys’ don’t want to stir up a hornet’s nest, all you need is VC and a few good programmers to pick a fight with Apple.
Collaborative Content Creation Software, such as DropBox Paper enables individuals or teams to produce content, all the while leveraging the Storage platform for e.g. version control,
Embedded integration in a suite of content creation applications, such as Microsoft Office, and OneDrive.
“…companies like Google and Facebook pay top dollar for some really smart people. Only a few hundred souls on Earth have the talent and the training needed to really push the state-of-the-art [AI] forward, and paying for these top minds is a lot like paying for an NFL quarterback. That’s a bottleneck in the continued progress of artificial intelligence. And it’s not the only one. Even the top researchers can’t build these services without trial and error on an enormous scale. To build a deep neural network that cracks the next big AI problem, researchers must first try countless options that don’t work, running each one across dozens and potentially hundreds of machines.”
This article represents a true picture of where we are today for the average consumer and producer of information, and the companies that repurpose information, e.g. in the form of advertisements.
The advancement and current progress of Artificial Intelligence, Machine Learning, analogously paints a picture akin to the 1970s with computers that fill rooms, and accept punch cards as input.
Today’s consumers have mobile computing power that is on par to the whole rooms of the 1970s; however, “more compute power” in a tinier package may not be the path to AI sentience. How AI algorithm models are computed might need to take an alternate approach.
In a classical computation system, a bit would have to be in one state or the other. However quantum mechanics allows the qubit to be in a superposition of both states at the same time, a property which is fundamental to quantum computing.
The construction, and validation of Artificial Intelligence, Machine Learning, algorithm models should be engineered on a Quantum Computing framework.
The investment of $500 million annually signals the importance of the so-called Internet of Things to the future of manufacturing.
G.E. expects revenue of $6 billion from software in 2015, a 50 percent increase in one year. Much of this is from a pattern-finding system called Predix. G.E. calls its new service the Predix Cloud, and hopes it will be used by both customers and competitors, along with independent software developers. “We can take sensor data from anybody, though it’s optimized for our own products,” Mr. Ruh said.
[Competitive solutions from IBM, Microsoft, and Google] raises the stakes for G.E. “It’s a whole new competition for them,” said Yefim Natis, a senior analyst with Gartner. “To run businesses in a modern way you have to be analytic and predictive.”
G.E. is running the Predix Cloud on a combination of G.E. computers, the vast computing resources of Amazon Web Services, and a few [local] providers, like China Telecom.
China, along with countries like Germany, [are] sensitive about moving its data offshore, or even holding information on computers in the United States.
The practice of “Ring fencing” data exists in dozens of jurisdictions globally. Ring fencing of data may be a legal and/or regulatory issue, that may inhibit the global growth of cloud services moving forward.
Are you trying to apply metadata on individual files or en masse, attempting to make the vast growth of cloud storage usage manageable, meaningful storage?
Best practices leverage a consistent hierarchy, an Information Architecture in which to store and retrieve information, excellent.
Beyond that, capabilities computer science has documented and used time and time again, checksum algorithms. Used frequently after a file transfer to verify the file you requested is the file you received. Most / All Enterprise DAM solutions use some type of technology to ‘allow’ the enforcement of unique assets [upon upload]. In cloud storage and photo solutions targeted toward the individual, consumer side, the feature does not appear to be up ‘close and personal’ to the user experience, thus building a huge expanse of duplicate data (documents, photos, music, etc.). Another feature, a database [primary] key has been used for decades to identify that a record of data is unique.
Our family sharing alone has thousands and thousands of photos and music. The names of the files could be different for many of the same digital assets. Sometimes file names are the same, but the metadata between the same files is not unique, but provides value. Tools for ‘merging’ metadata, DAM tools have value to help manage digital assets.
Cloud storage usage is growing exponentially, and metadata alone won’t help rope in the beast. Maybe ADHOC or periodic indexing of files [e.g. by #checksum algorithm] could take on the task of identifying duplicate assets? Duplicate assets could be viewed by the user in an exception report? Less boring, upon upload, ‘on the fly’ let the user know the asset is already in storage, and show a two column diff. of the metadata.
It’s a pain for me, and quite possibly many cloud storage users. As more people jump on cloud storage, this feature should be front and center to help users grow into their new virtual warehouse.
The industry of cloud storage most likely believes for the common consumer, storage is ‘cheap’, just provide more. At some stage, the cloud providers may look to DAM tools as the cost of managing a users’ storage rises. Tools like:
duplicate digital assets, files. Use exception reporting to identify the duplicates, and enable [bulk] corrective action, and/or upon upload, duplicate ‘error/warning’ message.
Dynamic metadata tagging upon [bulk] upload using object recognition. Correlating and cataloging one or more [type] objects in a picture using defined Information Architecture. In addition, leveraging facial recognition for updates to metadata tagging.
e.g. “beach” objects: sand, ocean; [Ian Roseman] surfing;
Brief questionnaires may enable the user to ‘smartly’ ingest the digital assets; e.g. ‘themes’ of current upload; e.g. a family, or relationship tree to extend facial recognition correlations.
e.g. themes – summer; party; New Year’s Eve
e.g. relationship tree – office / work
Pan Information Architecture (IA) spanning multiple cloud storage [silos]. e.g. for Photos, spanning [shared] ‘albums’
Publically published / shared components of an IA; e.g. Legal documents; standards and reuse
Although this is a saturated space, with many products, some highly recommended, I thought this idea might interest those involved in the Digital Asset Management space. Based on the maturity of existing products, and cost, it’s up to you, build or buy. The following may provide an opportunity for augmenting existing Google products, and overlaying a custom solution.
Google products can be integrated across their suite of solutions and may produce a cloud based, secure, Digital Asset Management, DAM solution. In this use case, the digital assets are Media (e.g. videos, still images)
A Google DAM may be created by leveraging existing features of Google Plus, Google Drive, YouTube, and other Google products, as well as building / extending additional functionality, e.g. Google Plus API, to create a DAM solution. An over arching custom framework weaves these products together to act as the DAM.
Google Digital Asset Management (New)
A dashboard for Digital Asset Management should be created, which articulates, at a glance, where project media assets are in their life cycle, e.g. ingestion, transcoding, editing media, adding meta data, inclusion / editing of closed captions, workflow approvals, etc.
Creation and maintenance of project asset folder structure within storage such as Google Drive for active projects as well as Google Cloud Storage for archived content. Ingested content to arrive in the project folders.
Ability to use [Google YouTube] default encoding / transcoding functionality, or optionally leverage alternate cloud accessible transcoding solutions.
A basic DAM UI may provide user interaction with the project and asset meta data.
Components of the DAM should allow plug in integration with other components on the market today, such as an ingestion solution.
Google Drive and Google Cloud Storage. Cloud storage offers large quantities of storage e.g. for Media (video, audio), economically.
Google Drive ingestion of assets may occur through an automated process, such as a drop folder within an FTP site. The folder may be polled every N seconds by the Google DAM orchestration, or other 3rd party orchestration product, and ingested into Google Drive. The ingested files are placed into a project folder designated by the accompanying XML meta file.
The version control of assets, implemented by Google Drive and the DAM to facilitate collaboration and approval.
Distribution and publishing media to designated people and locations, such as to social media channels, may be automatically triggered by DAM orchestration polling Google Drive custom meta data changes. On demand publishing is also achievable through the DAM.
Archiving project assets to custom locations, such as Google Cloud solution, may be triggered by a project meta data status modification, or on demand through the DAM.
Assets may be spawned into other assets, such as clips. Derived child assets are correlated with the master, or parent asset within the DAM asset meta data to trace back to origin. Eliminates redundancy of asset, enabling users to easily find related files and reuse all or a portion of the asset.
Google Docs
Documents required to accompany each media project, such as production guidelines, may go through several iterations before they are complete. Many of the components of a document may be static. Google Docs may incorporate ‘Document Assembly’ technology for automation of document construction.
Google’s YouTube
Editing media either using default YouTube functionality, or using third party software, e.g. Adobe suite
Enable caption creation and editing may use YouTube or third party software.
The addition & modification of meta data according to the corporate taxonomy may be added or modified through [custom] YouTube fields, or directly through the Google DAM Db where the project data resides.
Google’s Google Plus +
G+ project page may be used for project and asset collaboration
Project team members may subscribe to the project page to receive notifications on changes, such as new sub clips
Asset workflow notifications, human and automated:
Asset modification approvals (i.e. G+ API <- -> DAM Db) through custom fields in G + page
Changes to assets (i.e. collaboration) notifications,
[Automated] e.g. ingestion in progress, or completed updates.
[Automated] Process notifications: e.g. ‘distribution to XYZ’ and ‘transcoding N workflow’. G + may include links to assets.
Google Plus for in-house, and outside org. team(s) collaboration
G + UI may trigger actions, such as ingestion e.g. by specifying a specific Google Drive link, and a configured workflow.
Google Custom Search
Allows for the search of assets within a project, within all projects within a silo of business, and across entire organization of assets.
Ability to find and share DAM motion pictures, still images, and text assets with individuals, groups, project teams in or outside the organization. Google Plus to facilitate sharing.
Asset meta data will e.g. describe how the assets may be used for distribution, digital distribution rights. Users and groups are implemented within G+, control of asset distribution may be implemented in Google Plus, and/or custom Google Search.
The Product Owner (PO) is a member of the Agile Team responsible for defining Stories and prioritizing the Team Backlog to streamline the execution of program priorities while maintaining the conceptual and technical integrity of the Features or components for the team.