Why Automation is the Secret Ingredient for Big Data Clusters

Posted by Greg Bruno on Apr 30, 2013 1:38:00 PM

automatic definitionThe word “automatic” has been around since the 1500’s, but really came to the fore in 1939. That’s when the New York World’s Fair sparked everyone’s imagination with visions of technology that promised to solve all of our problems through automation. Recently, while working with one of our customers, I was reminded how automation can still surprise people. Let me tell you what I mean.

Prove It

A large credit card company recently asked us to participate in a “proof-of-concept” for their big data project. As a startup, we are always thrilled when one of the big boys wants to try out our wares, so we jumped at the opportunity.

When we arrived on site in their data center, they assigned a half-dozen machines for us to use. One would become the StackIQ Cluster Manager, and the other 5 would become cluster nodes running Hadoop. We are used to building clusters of all sizes using our software, and knew that a small, straightforward installation like this one would be a cake walk. We set about our task.

We set up a few parameters for the cluster, and launched the StackIQ Cluster Manager. It was soon up and running without a hitch, as expected.

Next, we used the Cluster Manager to install the cluster machines. Twenty minutes later, all 5 backend machines are up and running Hadoop services. Smooth. No problem. Expected.

It’s A Trap!

That’s when my colleague and I noticed that the customer’s IT people are whispering to each other, and we started to wonder if we’d done something wrong. We checked our screens, and found that cluster was indeed up and running — ready to accept Map/Reduce jobs.

So we took a deep breath and walked over to the gathered whisperers and asked if there was a problem. One of them asked in a hushed voice, “Um, how’d you guys do that?”

“Do what?” we answered.

“Bring up that one machine?” he said, pointing at one of the cluster servers.

After we explained that we hadn’t done anything special, we just let our Cluster Manager do its thing, the customer confessed, "We’ve been struggling to configure that machine for over 2 weeks now and haven’t been able to get it to install. There seemed to be something wrong with the configuration of the disk controller, but we haven’t been able to fix it.”

We smiled.

That’s the power of true automation. That’s what we designed our software to do. That’s what makes us very proud of the software we build. It takes the headaches out of setting up clustered infrastructure of any size by automating nearly everything — including configuring those pesky disk controllers.

What was a major problem for our customer — one they hadn’t been able to solve in weeks — wasn’t even a bump in the road for our cluster manager. It found the controller, configured it, and moved on to its next task. Smooth. No problem. Expected.

Less Tedium

It can take as many as 80 manual steps to correctly configure a disk controller for use in a Big Data cluster, and clusters have a lot of disks — and controllers. We knew that we had to automate the configuration of all those disks to help cluster operators build their clusters efficiently. Automating the procedure dramatically reduces the time it takes to put a cluster into production.

Here’s how we do it. On first installation of a server, our software interacts with the disk controller to optimally configure it based on the node’s intended role. For example, if the machine is a data node, the disk controller will be configured in “JBOD mode” with each disk configured as a RAID 0. However, if the machine is going to be a Cassandra data node, the data disks will be automatically configured as a RAID 10. This all happens automatically — no manual steps — ensuring that all cluster nodes are optimally configured from the start.

The goal is a smooth configuration process. It’s just a bonus when we get to surprise and delight a customer who sees their cluster up and running after struggling for weeks on their own trying to solve a stubborn configuration problem.

Smooth. No problem. Expected.

 

Greg Bruno

@itsdrbruno

Topics: big data, cluster management, automation

Subscribe via E-mail

Follow StackIQ

Resources

White Paper - Boosting Retail Revenue and Efficiency with Big Data Analytics

    Download      

GigaOM Research report - Scaling Hadoop Clusters: The Role of Cluster Management:

    Download    

White Paper - The StackIQ Apache Hadoop Reference Architecture:

    Download