Cazena Data Science Sandbox: Bridging the Self-Serve Analytics Chasm

February 13, 2017

By: Prat Moghe

Over the past few years, I have observed a deepening organizational divide in large data-driven companies.
 
On one hand, IT and data owners have their hands full managing their current data infrastructure and platforms. Their world is full of complexity, with intricate pieces of the ecosystem that cement together and work to do one thing well – really well. But a world that is hard to change, with long cycle times. Gravity rules the day.
 
On the other hand, we see analytics and data scientists brimming with promise, and often sitting in line-of-business groups. These teams are running fast and hard at ad hoc analytical experiments with sample datasets from many sources. They are usually tasked with launching new initiatives that drive customer experience, top-line revenue or competitive game-changers. Time is of essence, process be damned.
 
The reality is that neither team can on their own deliver the forward-thinking outcomes demanded by organizational leadership.  IT and data owners have much of the “asset,” that is corporate and operational data, often in silos, which can reveal new insights when combined. But the analytics teams are closer to business outcomes, and can move faster – or could move faster if they had easy access to the data. A bridge is missing that links these. I call it the serve-self analytics chasm.
 
We’ve watched the tension across this chasm firsthand at Cazena. After launching our platform last year, we saw tremendous excitement with CTOs, CIOs and data architects who wanted to augment, migrate or re-platform their on-premises analytic infrastructure to the public cloud for faster cycle times.
 
For instance, our Data Mart as a Service demonstrated that a production data warehouse appliance could be migrated to the cloud within 4 weeks. Our Data Lake as a Service similarly solved a need for a fully-managed Hadoop/Spark production environment for staging, ELT and other workloads, which could be up and running in days. It was all about IT and engineering agility. When I further probed their end goal, the answer was always the same: These teams were trying to bridge the chasm between IT and LoB analytics.
 
Then, we quickly started hearing from a third group. Data scientists wanted to deliver fast outcomes by consuming these data assets and any others they can get their hands on.  But their requirements were different. Data scientists and analytics teams needed a fast and repeatable process and platform to quickly access data, collaborate and run advanced analytics at scale, without DevOps or data engineering expertise.
 
We realized we needed a new product: a Data Science Sandbox as a Service to bridge the self-serve analytics chasm.
 
The Data Science Sandbox as a Service has five design goals:
  1. Agnostic: The Sandbox becomes a centralized environment with support for all the current languages, technologies and tools, such as R, Sparklyr, Python, Spark, SQL, and many other emerging methods.
  2. Fully Managed: The Sandbox has to be extremely light on support requirements from the team. Minimal DevOps or data engineering expertise needed to support analytics cost-effectively.
  3. Data Ready: Recall that the Sandbox bridges the chasm! It needs built-in data movers and connectors to easily ingest data from enterprise data owners or external data sources.
  4. Secure and Compliant: The Sandbox has to be trusted for enterprise data owners to connect, or bridge, their data to it. 
  5. Simple and fast: Few clicks and up in minutes.
The Cazena team is really excited to announce the launch of our Data Science Sandbox as a Service. We previewed it at the Spark Summit East last week and got great feedback. We are making the Sandbox available for data scientists and analysts with a free week and full access to our team.  Please sign up here. Let’s see if Cazena and our Sandbox can collectively help you get an analytic outcome within a week or less! 
 

Back ›