Best practices for enterprise adoption of big data in the cloud
March 31, 2016 (Thursday), 11:50am–12:30pm, Strata + Hadoop World, San Jose, Room LL20D
Strata+Hadoop World (Spring Edition) is just around the corner. This year, I’m speaking about best practices for enterprise adoption of the cloud for big data. Along the way, I’ll share some real-world stories from our collaborations with enterprises. The cloud project that looked easy, started great, and eventually took three years to production? My colleagues call it the IT zombie apocalypse. The innocent analyst who inadvertently caused the $57,650 AWS bill? Funny, if they’re not on your team.
We all agree that the public cloud is more enterprise-ready than ever before, with more security controls and features. And, we agree that the cloud is the way of the future, with compelling agility and efficiency benefits. Done intelligently, you can even reduce costs (which is not automatic with the cloud!).
However, enterprises find that it takes significant effort to make the cloud work with their existing infrastructure. It’s challenging enough to manage on-premises data centers, let alone adding a new and different cloud environment. Savvy implementation of the cloud makes all the difference.
At Strata, I will cover five critical questions for enterprise big data projects in the cloud:
Why make the cloud an analytic platform? When it comes to data analytics, it’s important to architect and manage the cloud like a platform. This platform must be able to act as a pipeline that can process workloads and orchestrate data through its entire life-cycle – from raw “data lakes” to data marts. It’s a much better approach than tactical use of the cloud for random projects, which is inefficient and leads to new data silos. A platform enables consistency across projects, particularly for critical areas like security, governance and compliance.
How do you connect the cloud with the data flow you already have? Enterprises have decades of data infrastructure built-up on premises. Using the cloud for data processing adds agility, but requires careful design of data movement, migration and integration with your existing data flow.
How can the cloud be managed reliably and predictably? The cloud is a different operating environment, which makes it particularly challenging for enterprises to manage it the same way they manage data centers. How do you provision infrastructure cost-effectively for each workload's SLA? How do you manage and monitor infrastructure, data and users?
How should you securely lock down the cloud? This is obviously a big one, though the emphasis is often misplaced: Gartner estimates that through 2020, 95% of cloud security failures will be the fault of the customer – not the cloud provider. Let’s talk about how to avoid being part of that statistic! The key is enterprise-grade cloud security policies and tools.
How can you best govern the cloud and manage costs? If you are not careful, the cloud can quickly get out of control organizationally. It’s designed to be easy to spin up hundreds of clusters! So, how do you set up a cloud environment with the right controls and visibility? And how do you balance the agility of self-serve with the predictability of enterprise budgets?
Ultimately the most important metric to consider is “time to analytics” – the time between when an enterprise gets data to when the right stakeholder has access to that data for analysis. The cloud can make a significant impact, if implemented correctly. Our Strata discussion will focus on how you can improve your "time to analytics" with a production-ready cloud environment, where you can easily point your analytic jobs and tools to data. Yes, it's possible! I look forward to seeing you there.