DevOps

Network throughput testing – Distributed iperf3

The tool iperf (or iperf3) is popular as a synthetic network load generator that can be used to measure the network throughput between two nodes on a TCP/IP network. This is fine for point-to-point measurements. However, when it comes to distributed computing clusters involving multiple racks of nodes, it becomes challenging to find a tool …

Network throughput testing – Distributed iperf3 Read More »

StarCluster, Cloudera Manager, EC2 Part 2

This is a continuation of my previous post on this topic. First, a disclaimer – I have focused on CentOS primarily. I will try and update this to accommodate more than one Linux distro (and even consider writing for Solaris-based implementations in the future). Here’s the skeleton of the cloudera manager setup plugin: import posixpath …

StarCluster, Cloudera Manager, EC2 Part 2 Read More »

Cloudera Hadoop, StarCluster and Amazon EC2

I ran into an incredible tool known as StarCluster, which is an open-source project from MIT (http://star.mit.edu/cluster/). StarCluster is built using Sun Microsystem’s N1 Grid Engine software (Sun used it to do deployment for HPC environments). And the folks at MIT developed on a fork of that (SGE – Sun Grid Engine) and StarCluster was …

Cloudera Hadoop, StarCluster and Amazon EC2 Read More »