hi

What is Big Data as a Service?

Big Data as a Service (BDaaS) refers to an emerging category of analytical data processing services delivered via the cloud. It replaces the complexity, long implementations, and capital expense of on-premises data infrastructure with the ready availability, pay-as-you-go cost model, and elastic scalability of cloud. Big Data as a Service incorporates multiple data processing technologies in order to handle any type of data, analytical workload and SLA, from batch processing to interactive data visualization to real-time streaming data analysis, at an optimal price/performance level. It abstracts away the underlying complexities of data technology stacks, cloud provisioning and ongoing system management so that users can focus on their analytical tools and business outcomes.

Drivers and Adoption

Big Data as a Service is an outgrowth of two fundamental IT trends, big data and cloud computing:

  • Big data refers to the rapid growth in the variety, volume and velocity of corporate data.
  • Cloud computing refers to the migration of software and IT infrastructure from corporate data centers to services delivered via the internet.

The complexity, size, and rapid evolution (see sidebar) of big data fit naturally with the cloud computing paradigm’s ability to scale and simplify. With Big Data as a Service, analytical data processing becomes available “as a service” just like other enterprise IT categories such as CRM (salesforce.com), HR (Workday), and file sharing and back-up (Box).

Big Data as a Service adoption is accelerating as vendors address security, data movement, and other barriers. Additionally, the growing volume of data now generated outside a company’s firewall by mobile, social and IoT applications is shifting the “data gravity” away from the corporate data center and to the cloud.

Applications

Enterprises' initial adoption of Big Data as a Service frequently targets one of several common use cases.

Collecting and processing data from Internet of Things, social, mobile, and other external sources: Big Data as a Service is great for external data because the data is already in the cloud, meaning it takes less resource to move and land it. It also avoids the delay and effort required by network and data security policies before external data sources are allowed into the enterprise.

Some companies create “data pipelines” where they collect raw data via Big Data as a Service, then cleanse, filter or query the data to create a valuable subset, which they move into another analytic environment which could also be cloud-based or on-premises.


Data warehousing: Offloading workloads to Big Data as a Service frees up data warehouse capacity, extending its life in the face of growing demands and optimizing its performance.

Since it can be added incrementally and as needed, offloading workloads to the cloud is a faster and cheaper alternative to purchasing additional data warehouse appliances or more data warehousing capacity.


Analytical sandboxes for data scientists: Data science requires testing numerous hypotheses, discarding the failures and pursuing the promising ones. Such workloads can be difficult for traditional data infrastructure to handle, but fit well with the ability of Big Data as a Service to rapidly provision “sandboxes” with all data in one place, where it can be combined, queried and analyzed to discover new patterns or insights. Analysts can explore data with no impact to other production processes or systems.

Disaster recovery (DR): For data warehouse environments that require DR, the traditional options are buying two appliances which need to be located in two different datacenters, or maintaining two environments, one for production, and one for DR.

Big Data as a Service can be used as a DR platform for data warehouses without requiring large capital expenditure in a secondary EDW appliance.

Sharing or monetizing data: Big Data as a Service can provide a secure environment for sharing data with partners or customers without having to provide them direct access to the corporate network. This facilitates doing business with partners, for example by sharing metrics (retail, advertising) or research (pharma, manufacturing). BDaaS also facilitates monetizing information assets as analytical/data products.

Attributes

The most important attributes of Big Data as a Service are:

Cloud-based: Anything delivered “as a Service” must, by definition, be cloud-based. Big Data as a Service should utilize multiple cloud infrastructure providers, such as Azure and Amazon Web Services, to align with existing enterprise standards and to avoid risk of cloud provider lock-in.

Workload engine flexibility: The ability to utilize multiple, best-of-breed data processing engines such as MPP SQL, Hadoop, Spark, search, etc. enables Big Data as a Service to handle any type of analytical workload – batch or real-time, structured or unstructured data - with the best price/performance. Workload engine flexibility differentiates Big Data as a Service from narrower cloud-based data processing services like Data Warehouse as a Service or Hadoop as a Service.

Automated provisioning and optimization: Big Data as a Service should include the ability to intelligently configure the cloud infrastructure to support the data volume and customer workload response time (or SLA) at an appropriate price/performance level, and continuously optimize the infrastructure over time.

Managed service: Unlike PaaS or IaaS, Big Data as a Service is an end-to-end service that includes all of the system monitoring, managing and maintenance tasks that otherwise require ongoing customer resources. Solutions should be able to deliver the required SLA (i.e. response time) for each workload without customer monitoring or intervention.

Security and Privacy: A secure BDaaS system will isolate each customers’ environment in a single tenant architecture, and ensure encryption of data in motion and at rest (preferably with encryption keys under the control of the customer). Some organizations will also need strong auditing and security monitoring features to meet compliance requirements.

Integration and data movement: Big Data as a Service will integrate with existing data infrastructure, enterprise IT tools, and analytical tools so that IT and users aren’t negatively affected. Use of cloud resources for data processing should be seamless, including any required data movement between enterprise and cloud.

Benefits

Big Data as a Service brings the well-known cost and agility benefits of cloud to the realm of data analytics infrastructure.

Speed and Agility: Big Data as a Service automates the hundreds of tasks and decisions required to select, size and provision big data technology and cloud resources. So data infrastructure can spin up nearly instantaneously, instead of the months it takes to procure, set up and integrate on-premises data infrastructure.

Production Ready: Big Data as a Service addresses the enterprise monitoring, management, governance and security requirements that IT would otherwise have to implement on top of public cloud database services to move beyond testing to production-ready infrastructure.

Simplicity: Complete Big Data as a Services are “end-to-end” so that enterprises can deploy production-grade data processes quickly for any analytic workload. Secure data movement to and from the cloud, individual technology licenses and easy integration with existing on-premises systems, tools and processes are included. Because they utilize multiple data processing technologies, IT can handle any type of big data processing through a single service, rather than having to select, onboard, and manage distinct solutions for each type of workload.

Cost-effectiveness: Cloud processing costs as much as 80% less than traditional on-premises alternatives and replaces large, up-front capital costs with a pay-as-you-go model. Enterprises buy only what they need, and scale as workloads grow. Additional savings come when a Big Data as a Service solution can utilize the most cost-effective technology stack for a given workload based on the required SLA.

Security: A full Big Data as a Service addresses security concerns through a single-tenant architecture and end-to-end encryption to ensure that data always stays safe. Existing IT systems for monitoring, management, governance, etc. are supported.

Focus: Big Data as a Service includes 24x7 monitoring and support as well as any software updates and other maintenance so that data people can focus on data and analytics rather than maintaining infrastructure.

Future proof: Big Data as a Service can plug in different data technologies to provide the best price/performance for different workloads, so users aren’t locked in to any specific tool. They will be able to leverage additional, new open source technologies, without having to evaluate them or hire specialized skills.

Scalable: The ability to add resources as needed to support growing workloads makes cloud services inherently scalable, at a far more granular level than on-premises data infrastructure hardware and software.

References and Further Reading

Defining Big Data as a Service

  • Dataversity, “Big Data as a Service: Is the World Ready?” This June 2013 article was among the first to identify the genesis of BDaaS in the “growing need to offload these Big Data processes to a Cloud-based, third party vendor”.
  • Forbes, “Big Data as a Service is the Next Big Thing” Bernard Marr’s article first brought Big Data as a Service to the attention of the general business audience, defining it as “a wide variety of outsourcing of various Big Data functions to the cloud.”
  • Techopedia, Defining Big Data as a Service Read a basic definition of Big Data as a Service -- though the market is emerging, there are some fundamental characteristics that most services have in common.
  • SearchCloudApplications, Keeping up Data Flow with Big Data as a Service An industry publication provides historical perspective on the Big Data as a Service market, with early coverage of the trend.

Adopting Big Data as a Service

Industry Analysts