Linux

Tuning the TCP stack and establishing throughput requirements matters, when data traverses over a WAN

In a past life when I used to work for a wireless service provider,  they used a vended application to evaluate how much data bandwidth customers were consuming and that data was sent to the billing system to form the customers’ monthly bills. The app was a poorly written (imho) and was woefully single-threaded, incapable …

Tuning the TCP stack and establishing throughput requirements matters, when data traverses over a WAN Read More »

StarCluster, Cloudera Manager, EC2 Part 2

This is a continuation of my previous post on this topic. First, a disclaimer – I have focused on CentOS primarily. I will try and update this to accommodate more than one Linux distro (and even consider writing for Solaris-based implementations in the future). Here’s the skeleton of the cloudera manager setup plugin: import posixpath …

StarCluster, Cloudera Manager, EC2 Part 2 Read More »

Cloudera Hadoop, StarCluster and Amazon EC2

I ran into an incredible tool known as StarCluster, which is an open-source project from MIT (http://star.mit.edu/cluster/). StarCluster is built using Sun Microsystem’s N1 Grid Engine software (Sun used it to do deployment for HPC environments). And the folks at MIT developed on a fork of that (SGE – Sun Grid Engine) and StarCluster was …

Cloudera Hadoop, StarCluster and Amazon EC2 Read More »