5 Crucial Tactics for Managing Hadoop
So, after spending a decade convincing the c-suite that data is a strategic competitive weapon, they've seen the light. That's the good news. The bad news is the bright spotlight is now focused squarely on you and everyone is expecting results – right now. You’ve armed yourself with Hadoop, but haven’t had much chance to establish best practices for Hadoop management since the first stable release was only published in May 2012. Fortunately, some of the best practices you already know from managing a data center can be applied to managing Hadoop. Here are 5 items we think should make your list.
1. Who You Gonna Call?
Hadoop is an application inextricably tied to the hardware it rides on, so the first best practice we recommend is to find a support organization that takes a holistic approach – one that supports the application and the hardware too. Our experience is that if you don't do this, then the people supporting the hardware will point at the software folks and the software team will point at the hardware guys. We don't like the cliché “one neck to choke,” but we can't sugar coat the issue.
2. Don't Paint Yourself Into a Corner
You don't need to be Bobby Fisher, Boris Spasky, or even IBM’s Watson and play 15 moves ahead, but you ought to have the next two steps of your Hadoop management plan figured out. You've seen your company's data grow exponentially, and it’s likely that curve will continue to hockey stick up and up. Do you know what to do when the data set doubles? When the need to process data doubles? Be a Boy Scout. Be prepared.
3. Pick a Religion
Will you use on-site processing or will you send the data to the cloud? AWS has half a million clusters already running Hadoop. Netflix has a petabyte on Amazon. Amazon knows a thing or two about Hadoop management, but we've seen Amazon outages recently that effectively put its customers out of business for a while. And, there's always the possibility, heaven forbid, that someone will break into Amazon and make a copy of everything there, including the social security numbers, credit cards, names and addresses of your customers. Trust us: misery does not love company.
4. Mix and Match with Care
While you might be tempted to throw all your spares into a cluster, but if you don’t take care, you may regret it when you realize every piece of equipment has a different service and performance profile and you've created a rat's nest of issues that are hard to resolve. Still, with the right management tools, you can successfully mix and match cluster components, and keep them running reliably.
5. Build Fences
There's an old adage that goes, “When you have a hammer, everything looks like a nail.” When your Hadoop cluster goes into production and people start experiencing the benefits, you will be the honey attracting the flies and could be deluged with project requests. All too often, when the cheering begins, we want to ride that wave of popularity. Don't. Give yourself and your staff six to nine months to master Hadoop management. Push on that first implementation and learn all you can before taking on more.
With the advent of big data and data-centric company strategies, IT is finally getting what it's always wanted -- a seat at the big table. Hadoop is the tool we're using to leverage the data, so we need to be very thoughtful about how we do it or we'll be sent back to the kids' table.
Well, that’s our list. What do you think? Are we off base? Have we missed any? Share your experience in the comments.