As enterprises seek to migrate and manage their production analytic workloads in the public cloud, we increasingly hear teams considering PaaS, Platform-as-a-Service, offerings (such as AWS EMR, Redshift, Microsoft Azure HDI) as the first stop for implementation. But many don’t seem to realize that PaaS only provides a foundational analytic capability in the cloud. Significant additional work is needed to migrate and manage analytics in production, particularly for enterprises with legacy, on-premises data systems.
Common Challenges with PaaS
- PaaS does not seamlessly integrate with the enterprise. Unless you are "cloud-native," with all of your data and tools in the cloud already, significant work is needed to deploy and manage PaaS as part of an on-premises data flow
- PaaS is not turnkey. Production analytic processing requires developing an optimized data pipeline to move data from sources to analytic-ready. With, PaaS, it is left to you to figure out how to stitch together and configure the right components for data ingest, storage and compute instances, analytic tools, etc.
- PaaS does not mean fully-managed. A platform for analytics is not just about keeping up a cluster. Skilled experts are required to manage workloads and ensure performance, cost management and adherence to SLAs.
- PaaS does not check off enterprise security and compliance requirements. You must do significant work to extend PaaS controls to achieve user and data level security, meet enterprise compliance requirements AND keep it compliant on-going.
- PaaS does not provide production operations. Dev-ops (development operations) resources are required for managing the health of the pipeline, upgrades, tool validation, etc.
Cazena's Big Data as a Service addresses these challenges and goes way beyond a PaaS to offer you a “fully-managed” experience. From an adoption perspective, this broadens the use of the cloud for all enterprises, not just the do-it-yourself (DIY) early adopters. Specifically, enterprises that are data-driven, yet constrained on dev-ops expertise, can now leverage a fully-managed Big Data as a Service to scale their teams and accelerate outcomes.
A Fully-Managed Experience Means You Can:
- Get to production quickly! Like 2 weeks or less for validated, enterprise deployments.
- Scale dev-ops and manage your costs. Managing analytic workloads and production operations is the hardest part. When this is fully-managed for you, the dev-ops strain is vastly reduced and teams can focus.
- Keep data engineers and scientists on their goals, not on the managing cloud platforms. Key is to hide complexity so analysts get on with their analyses in a few clicks without worrying about the cloud infrastructure and data platform details
- Lower your risk by leveraging Cazena’s experience in cloud security. That helps you maintain data protection and achieve compliance regulations in a secure controlled environment. The fact that analytics are processed in the cloud should not change the risk posture.
- Deploy immediately with a platform that attaches to existing data flows without any changes. Any touchpoints are seamless.
- Create a best-fit, flexible architecture. Leverage and extend existing PaaS technologies.
As illustrated below, Cazena adds four new areas of capabilities above and beyond a typical PaaS stack to deliver a fully-managed experience. Cazena considers the complete data pipeline, and tightly integrates a variety of components and capabilities for an optimized workload experience. The Cazena console orchestrates this pipeline but abstracts the complexity of the cloud and platform infrastructure, so analysts get on with their analysis in a few clicks without worrying about the underlying technology.
For specific workloads, the stack incorporates appropriate third-party PaaS data processing engines, called workload engines in the Cazena architecture. The current Cazena stack embeds Cloudera EDH, AWS Redshift, RStudio and others as workload engines, and is easily extended to include other PaaS technologies.
Cazena Adds New Capabilities to a Typical PaaS Stack to Deliver a Fully-Managed Experience:
1. Data Ingestion and Tool Integration: For easy deployment, Cazena uses the concept of “gateways” as a single point of hybrid integration – think of it like a socket where you plug in tools and data. All the complexity of networking/VPN, data source connections and tool integrations are hidden from the end user, regardless of whether data and analytic tools are on-premises or in the cloud. This makes it easy for analysts to start their work immediately without changing their existing process.
To plug in new third party tools, the Cazena platform provides a curated single-tenant environment called App Cloud where new apps or tools can be landed, pre-configured and certified to securely access data or compute. Ultimately the cloud platform is a living ecosystem that grows continually – new tools and platforms are driven by new users, whether variety of analytic/ML/BI tools, notebooks, or environments. Supporting this growth with SLA and compliance is critical. The App Cloud combined with the Cazena stack provides an optimized end-to-end experience.
2. Security and Compliance: A common misunderstanding is that PaaS, with its lists of certifications, is sufficient to meet enterprise security and compliance needs. It’s true that PaaS provides a strong physical security foundation, but it’s left to you to secure data and user access. Security is an ongoing process and a lot of hard work. Is the cloud, with all of its end-points, truly private? Is all cluster access authenticated? Is all access identity and role driven? Is all data encrypted end-to-end including ingest and tool access? Are encryption keys managed appropriately? Are you monitoring for any intrusions at the data or configuration level? (A recent article highlights the importance of configuration to keep data secure.) Is all logging centralized and actionable? Cazena’s Big Data as a Service includes capabilities like end-to-end authentication, authorization, encryption and auditing, along with on-going security operations.
3. Workload and Pipeline SLA: Cazena’s fully-managed service extends beyond the typical cluster-availability SLA offered by PaaS to deliver an SLA based on your workloads and end-to-end analytic pipeline. The service continually profiles the workload performance (whether from data ingestion, data engineering, or data science) and optimizes the stack so you get the best experience at an acceptable cost. Cazena monitors mixed workloads and infers SLA performance against expectations. Optimization, via workload management or elastic scaling of clusters, is automatically applied as needed.
Example: Workload SLA with Cazena
As an example, the figure below illustrates a mixed workload with persistent data, where an Impala-based data engineering workload (batch window of 3 hours of less) and a Spark-based data science workload share a cluster. After optimization, the data engineering workload executes within 120 minutes, while still managing the performance of the data science workload.
4. Production Operations: Cazena’s Big Data as a Service solutions include all production operations such as white-glove support and resolution, 24x7 health-monitoring, alerting, upgrades, patching with validation, ability to add and bring in new tools and libraries, certify with newer analytic tools etc. In other words, our automation and expertise removes most of the production operations burden, allowing your dev-ops to scale.
The best way to understand the fully-managed Cazena service is to experience it. Cazena goes above and beyond PaaS, so you can get more done with data. Several enterprises are developing their production analytics capabilities around this EZ-PaaS, driven by their desire to ramp up speed and scale.
Drop me a note if you would like to learn more or have questions. If you want to experience Cazena for your own team, please reach us at POC@cazena.com.