Cloud Computing

Cloudy with the chance of “On Prem”

Lots of Data, consumed fast and in many different ways Over the years, the IT industry has had tectonic shifts in the way we do business. From mainframes we progressed to micro-computers to distributed computing environments in order to address the massive volumes, velocity, and dimensionality of data in the modern age. One trend that …

Cloudy with the chance of “On Prem” Read More »

StarCluster, Cloudera Manager, EC2 Part 2

This is a continuation of my previous post on this topic. First, a disclaimer – I have focused on CentOS primarily. I will try and update this to accommodate more than one Linux distro (and even consider writing for Solaris-based implementations in the future). Here’s the skeleton of the cloudera manager setup plugin: import posixpath …

StarCluster, Cloudera Manager, EC2 Part 2 Read More »

Cloudera Hadoop, StarCluster and Amazon EC2

I ran into an incredible tool known as StarCluster, which is an open-source project from MIT (http://star.mit.edu/cluster/). StarCluster is built using Sun Microsystem’s N1 Grid Engine software (Sun used it to do deployment for HPC environments). And the folks at MIT developed on a fork of that (SGE – Sun Grid Engine) and StarCluster was …

Cloudera Hadoop, StarCluster and Amazon EC2 Read More »