Jan 062014

I’ve just recently revived my interest in CMS (Content Management Systems) – specifically Joomla!

Why? Because I created this “portal” (http://www.medhajournal.com) and though I seriously doubt I have the time to actively maintain it, I’ve spent way too many hours working on it, to just abandon it on the wayside.

Medha Journal has evolved since it’s inception in 2007 from a Joomla 1.x version to Joomla 1.5.x, with a large hiatus in the middle (from 2010 through the last week of december 2013), until I just upgraded it (last week) from it’s rickety 1.5 version to the latest stable 2.5x version of Joomla.

Those who don’t know what joomla is, here’s a summary –

It is a PHP based open source Content Management System that can be extended to do more or less anything, using a massive collective Extensions library (sort of like CPAN), that is community driven.

So, to get back to the topic at hand. Since I had not dabbled with Joomla (besides the user-land/power-user oriented work – primarily content editorial stuff) in a long time, I decided to take the plunge with some down time in hand (the holidays) and upgrade it from 1.5 to 2.5.

In course of this migration, I also switched from the default Joomla content management to using a tool called K2 (which realistically is more suited from social media oriented content portals, such as Medha Journal).

One major issue I ran into was this –

The Old version of the website used an extension called “jomcomment” which was developed by some folks down in Malaysia or Indonesia, and allowed from some better spam control. I used this in conjunction with the myblog extension, developed by the same group of developers, to give my users a seamless (or relatively seamless) experience as they added content and commented on each other’s works.

However, with the newer versions of Joomla that currently are active (2.5x and 3.x), these extensions don’t work. Over the years, we had accumulated thousands of comments on the north of 2000 articles collected on the Journal. So, it was imperative to migrate these.

K2 (http://getk2.org) has a very nice commenting system with builtin Akismet spam filters, etc. So, the new site would obviously use this. Migration was an interesting proposition since the table structure (jomcomment to K2 comments) were not identical.

Old table looked like this:


Field Type Null Key Default Extra
id int(10) NO PRI NULL auto_increment
parentid int(10) NO 0
status int(10) NO 0
contentid int(10) NO MUL 0
ip varchar(15) NO
name varchar(200) YES NULL
title varchar(200) NO
comment text NO NULL
preview text NO NULL
date datetime NO 0000-00-00 00:00:00
published tinyint(1) NO MUL 0
ordering int(11) NO 0
email varchar(100) NO
website varchar(100) NO
updateme smallint(5) unsigned NO 0
custom1 varchar(200) NO
custom2 varchar(200) NO
custom3 varchar(200) NO
custom4 varchar(200) NO
custom5 varchar(200) NO
star tinyint(3) unsigned NO 0
user_id int(10) unsigned NO 0
option varchar(50) NO MUL com_content
voted smallint(6) NO 0
referer text NO NULL

The new table looks like this:

Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
itemID int(11) NO MUL NULL
userID int(11) NO MUL NULL
userName varchar(255) NO NULL
commentDate datetime NO NULL
commentText text NO NULL
commentEmail varchar(255) NO NULL
commentURL varchar(255) NO NULL
published int(11) NO MUL 0

This called for some manipulation of the data extracted from the old jomcomment tables, before they got inserted back into the K2 comments table in the new database (both were MySQL).

So, I imported the data out of the old table using phpMyAdmin, massaged the data, modified it to fit the new table structure and imported it back in.

The hardest part was ensuring that the date field of the extracted CSV imported into the new table correctly. For that, I had to manually adjust the date format. One way is as shown here.

Another way is (if your data fits in Excel), manually set the column format (corresponding to your date field) to something that matches the MySQL format (in this case, yyyy-mm-dd hh:mm).


Jan 062014

In a past life when I used to work for a wireless service provider,  they used a vended application to evaluate how much data bandwidth customers were consuming and that data was sent to the billing system to form the customers’ monthly bills.

The app was a poorly written (imho) and was woefully single-threaded, incapable of leveraging oodles of compute resources that were provided in the form of the twelve two-node VCS/RHEL based clusters. There were two data centers and there was one such “cluster of clusters” at each site.

The physical infrastructure was pretty standard and over-engineered to a certain degree (the infrastructure under consideration having been built in the 2007-2008 timeframe) – DL580 G5 HP servers  with 128GB of RAM each, a pair of gigabit ethernet nics for cluster interconnects, another pair bonded together as public interfaces and 4 x 4GB FC HBAs through Brocade DCX core switches (dual fabric) to an almost dedicated EMC CLaRiiON C4-960.

The application was basically a bunch of processes that watched traffic as it flowed through the network cores and calculated bandwidth usage based on end-user handset IP addresses (i’m watering it down to help keep the narrative fast and fluid).

Each location (separated by about 200+ miles) acted as the fault tolerant component for the other. So, the traffic was cross-fed across the two data centers over a pair of OC12 links.

The application was a series of processes/services that formed a queue across the various machines of each cluster (over TCP ports). The processes within a single physical system too communicated via IP addresses and TCP ports.

The problem we started observing over a few months from the point of deployment was that the data would start queuing up and slow down/skew the calculations of the data usage, implications of which I don’t necessarily have to spell out (ridiculous).

The software  vendor’s standard response would be –

  1. It is not an application problem
  2. The various components of the application would write into FIFO queues on SAN-based filesystems. The vendor would constantly raise the bogey of SAN storage being slow, not enough IOPs being available and/or response time being poor. Their basis of coming up with was with what seemed to be an arbitrary metric of CPU IO Wait percentage rising over 5% (or perhaps even lower at times).
  3. After much deliberation poring over the NAR reports of almost dedicated EMC CX4-960s (and working with EMC), we were able to ascertain that the Storage arrays or the SAN were not in any way contributory towards any latency (that resulted in poor performance of this app).
  4. The processes being woefully single threaded, would barely ever use more than 15%of total CPUs available on the servers (each server having 16 cores at it’s disposal).
  5. Memory usage was nominal and well within acceptable limits
  6. The network throughput wasn’t anywhere near saturation

We decided to start profiling the application (despite great protestations from the vendor) during normal as well as problematic periods, at the layer where we were seeing the issues, as well as the immediately upstream and downstream layers.

What we observed was that:

  1. Under normal circumstances, the app would be spending most of it’s time in either read, write, send or recv syscalls
  2. When app was performing poorly, it would spend most of it’s time in the poll syscall. It became apparent that it was waiting on TCP sockets from app instances in the remote site (and issue was bidirectional).


Once this bit of information was carefully vetted out and provided to the vendor, then they decided to provide us with the minimal throughput requirements – they needed a minimum of 30mbps throughput. The assumption made was that on an OC12 (bw of 622mbps), 30mbps was quite achievable.

However, it so turns out that latency plays a huge role in the actual throughput on a WAN connection! (*sarcasm alert*)

The average RTT between the two sites was 30ms. Given that the servers were running RHEL 4.6, for the default TCP send and recv buffer sizes of 64K (untuned), the 30ms RTT resulted in a max throughput of about 17mbps.

It turns out that methodologies we (*NIX admins) would use to generate some network traffic and measure “throughput” with, aren’t necessarily equipped to handle wide area networks. For example, SSH has an artificial bottleneck in it’s code, that throttles the TCP window size to a default of 64K (in the version of OpenSSH we were using at that time) – as hardcoded in the channels.h file. Initial tests were indeed baffling, since we would never cross a 22mbps throughput on the WAN. After a little research we realized that all the default TCP window sizes (for passive ftp, scp, etc) were not really tuned for high RTT connections.

Thus begun the process of tweaking the buffer sizes, and generating synthetic loads using iperf.

After we established that the default TCP buffer sizes were inadequate, we calculated buffer sizes required to provide at least a 80mbps throughput, and implemented then across the environment. The queuing stopped immediately after.