Jun 082015
 

It’s been a long time since I’ve posted on this blog. So here goes – short and sweet. We’ve developed a couple of reference architectures for running Cloudera Enterprise (Cloudera Distributed Hadoop) on VMware.

 

 

Mar 182014
 

This is an old one. I thought I’d post this before it falls off main memory.

You will need two things for this. I’ll start with the obvious.

  1. A splunk server configured to act as a deployment server (which will result in apps, configurations, etc being deployed centrally)
  2. A centralized “bastion” host, that has root-level ssh-based trust established with your target hosts (ie splunk agents). Another option is to have unprivileged ssh key-based trusts set up, but with local sudo privileges (passwordless) on each of your target hosts.

Once you do that, you can run a modification of this script shown below —

I’d written in for an old shop of mine, where we managed Solaris on SPARC, Solaris on x86 and Redhat Linux. Modify/write functions corresponding to your target OS of choice and it should do pretty much what you need rather easily.

Another thing to watch out for is make sure the version of splunk you are working with hasn’t changed the underlying mechanisms. I had written this for Splunk v4.x at least 3 years back (can’t remember how long ago now – this is one script I didn’t put in version control, so no way for me to tell how old it…and timestamps changed because I copied them from my old laptop, etc).

 

Mar 122014
 

Okay, this has been a long time coming, but thought I’d write this down before I forget all about it.

First a little bit of a rant. In the EMC world, we assign LUNs to hosts and identify them using a hex-id code called the LUN ID. This, in conjunction with the EMC Frame ID, helps us uniquely and “easily” identify the LUNs, especially when you consider that there are usually multiple EMC frames and hundreds or thousands of SAN-attached hosts in most modern enterprise data centers.

Okay, the challenge is when using a disk multipathing solution other than EMC’s expensive power path software, or the exorbitantly expensive Veritas Storage Foundation suite from Symantec.

Most self-respecting modern operating systems have disk multipathing software built in and freely available.  and the focus item for this blog is Solaris. Solaris has a mature and efficient multipathing software called MPxIO, which cleverly creates a pseudo-device corresponding to the complex of the multiple (or single) paths via which a SAN LUN is visible to the Operating system.

The MPxIO driver, then using the LUN’s global unique identifier (GUID) to, perhaps can be argued, rightly identify the LUN uniquely. This is however a challenge since the GUID is a 32-character string (very nicely explained here http://www.slideshare.net/JamesCMcPherson/what-is-a-guid).

When I first proposed to our usual user community – Oracle DBAs, that they would have to use disks named as shown below, I faced outraged howls of indignation. “How can we address these disks in ASM, etc?”

c3t60060160294122002A2ADDB296EADE11d0

To work around, we considered creating an alternate namespace, say /dev/oracle/disk001 and so forth, retaining the major/minor numbers of the underlying devices.  But that would get hairy real quick, especially on those databases where we had multiple terabytes of storage (hundreds of LUNs).

If you are working with CLARiiON arrays, then you will have figure out some other way (than what I’ve shown in this blog post) to map your MPxIO/Solaris disk name to the valid Hex LUNID.

The code in this gist will cover what’s what. A friendly storage admin told me about this, wrt VMAX/Symmetrix LUNs. Apparently EMC generates the GUID (aka UUID) of LUNs differently based on whether it is a CLARiiON/VNX or a Symmetrix/VMAX.

In that, the fields that were extracted corresponding to variables $xlunid1 and so forth, and converted to their hex values, is the pertinent piece of information. Having this information then reduces our need to install symcli on every host so that we can extract the LUN ID that way.

This then will give us the ability to map a MPxIO/Solaris Disk name to an EMC LUN ID.

The full script is here — https://github.com/implicateorder/sysadmin-tools/blob/master/getluninfo.pl

Jan 172014
 

This post is a continuation of my previous post. I thought I’d elaborate on the foundation I have built thus far…

If you are like most engineers, you probably get a kick out of working with technology; taking things apart, seeing how they work and (hopefully) putting them back together again, in one piece.

Okay, the last bit was unnecessary…most of us engineers have never taken things apart as kids, or even if we have, we’ve always put things back together again, as good as new (I jest).

What I’m trying to articulate here is that it takes a very special type of person to be an engineer. You probably aren’t going to be an engineer if you aren’t cut out for it (the training years are laborious enough to dissuade anyone but the most highly motivated individuals).

In other words, it is in our nature to try and find out how things work. Why they do certain things and what the purpose of these “things” are. The world of the Systems Engineer (IT) is a little more daunting than that of the regular engineer (say electrical or mechanical). That is because we have to deal with both hardware and software and try and integrate a multitude of such systems, towards a certain goal.

The purpose of this post is to try and focus on that goal. As good Systems Engineers, we need to understand why we are doing what we are doing. And as a corollary thereof, we also need to understand what the implications of what we are doing are (how do we impact the bottom line).

In many organizations, where the business is not primarily IT, and the culture has not evolved yet to acknowledge IT as a partner (i.e. consider IT to be a cost center as opposed to a revenue generator), this can be hard to articulate. But there are steps that can be taken towards that end. Which is perhaps a topic for another post.

But to the task at hand — let’s look at “Why are we doing what we are doing”:

To be really able to understand the objective of anything, we need to understand the reasons behind it. Some could be super obvious, like why you are putting together an ERP system for your organization (or a Supply Chain Management component, for instance). The ERP system is the “life blood” of your business.

In some other cases, it might be harder to understand. For instance, you are putting together infrastructure for a data warehouse, it would be a very good idea you understand what kind of analytics the business wants to run on this system. In fact, a lot of very useful information will emerge once you start asking questions to understand the role of the system.

For e.g.

  • What kind of data are they analyzing
  • How many times do they want to run their calculations
  • Do they want to run queries against the system at regular intervals
  • or do they want to be able to run queries against the system at any time, giving their users access to a front-end reporting tool
  • What are the systems that feed data to this system
  • What are the systems that the data from this system get fed to (if any)
  • What type of model do they want to employ – data marts feeding the warehouse or warehouse feeding the data marts (i’m probably watering it down a lot – here is a good reference .)
  • What is the frequency of data load (is it batch or realtime)?

All these questions and host of others will start and it’s a good idea to not assume anything. When in doubt, ask questions. To repeat that old cliché – “There is no such thing as a dumb question”. Sometimes we are not responsible for designing these systems – it still is a good idea to ask those questions and learn.

Mature IT organizations will have high level design documents that cover the business processes supported by various components of IT and at a high level, how data flows between these components. If you don’t have access to this information, ask someone that does. If there is an Enterprise Architecture group in your organization, reach out to them (or do what I do, ask my manager to help me get that information).

If you haven’t tried to understand the “Big picture” before, you will be surprised as to how much a bunch of visio diagrams or power point presentations can tell you (I love pretty pictures – so go suck an egg, those who consider it below their dignity to suffer through powerpoint “death sessions”). Diagrams can tell us a lot more far more efficiently (a picture is worth a thousand words, as that old saying goes) than hundreds of paragraphs of text can.

Having presented that, I would like to bring focus back to strategic thinking now. According to me, it is both an art form as well as a type of particular ideology to adopt. That said, let me present some thoughts about things I consider absolutely paramount towards developing Strategic thinking:

  1. Learn the Big Picture
  2. Learn to separate things that are purely tactical from things that can be strategic (or can be addressed strategically)
  3. Identify engineers (whether in your group or peer groups) who can help you learn and improve your repertoire — develop a rapport with them. I’ve always found it helpful to identify who the “go-to” people were in any organization I’ve worked in. They can help in filling knowledge gaps and as partners to brain-storm with as the need arises.
  4. Always have the long term in mind (this calls for awareness of how technology is evolving – reading lots of material online helps, so does attending conferences and seminars) – may be, a starting point would be to map the medium term to the depreciation cycle of your organization. A long term could then be 2 depreciation cycles.
  5. Stay flexible and adaptive – don’t get locked into a particular technology or decision. Keep and open mind. Technology evolves rapidly (far more rapidly than a 3 or 5-year depreciation cycle, for instance). So, while we can’t expect to be at the cutting edge of technological development at all times, it is good to keep an eye on where the world is going to be, when we finally pull around that corner.
  6. Don’t ever accept a narrative that limits your domain (in other words – don’t let anyone else dictate what you can or cannot think of, wrt technology). Nothing is sacrosanct :-)
  7. Get into the habit of collecting data
  8. Develop a set of tools that you can use to access data and analyze rapidly (a tool like splunk is very powerful and multi-faceted. Learn how to use it and simplify your life). Learning how to use SQL with a database (mysql maybe), how to use pivot tables in MS Excel perhaps. They are all valuable tools for data analysis.

NOTE: The list doesn’t necessarily need to be considered in the order presented above.

 

Jan 102014
 

This is in continuation of my previous post regarding Tactical vs Strategic thinking. I received some feedback about the previous post and thought I’d articulate the gist of the feedback and address the concern further.

The feedback – It seems like the previous post was too generic for some readers. The suggestion was to provide examples of what I consider “classic” tactical vs strategic thinking.

I can think of a few off the top, and please forgive me for using/re-using clichés or being hopelessly reductionist (but they are so ripe for the taking…).

I’ve seen several application “administrators” and DBAs refusing to automate startup and shutdown of their application via init scripts and/or SMF (in case of Solaris systems).

Really? An SAP basis application cannot be automatically started up? Or a Weblogic instance?

Why?

Because, *sigh and a long pause*…well because “bad things might happen!”

You know I’m not exaggerating about this. We all have faced such responses.

To get to the point, they were hard-pressed for valid answers. Sometimes we just accept their position, filed under the category “it’s their headache”.  That aside, a “strategic” thing to do, in the interest of increased efficiency and better delivery of services would be to automate the startup and shutdown of these apps.

An even more strategic thing to do would be to evaluate whether the applications would benefit from some form of clustering, and then automate the startup/shutdown of the app, building resource groups for various tiers of the application and establishing affinities between various groups.

For example, a 3-tier web UI based app would have an RDBMS backend, a business logic middle tier and a web front end. Build three resource/service groups using a standard clustering software and devise affinities between the three, such that the web ui is dependent on the business layer and the business layer dependent on the DB. Now, it’s a different case if all three tiers are scalable (aka active/active in the true sense).

An even more strategic approach would be to establish rules of thumb regarding use of cluster ware, or highly available virtual machines. Establishing service level tiers, that deliver different levels of availability (I’ve based it off the service level provided by vendors in the past).

This would call for standardization of the infrastructure architecture and as a corollary thereof, associated considerations would have to be addressed.

For eg:

  1. What do you need to provide a specific tier of service?
  2. Do you have the budget to meet such requirements (it’s significantly more expensive to provide a 99.99% planned uptime as opposed to 99%)?
  3. Do the business owners consider the app worthy of such a tier?
  4. If not, are you sure that the business owners actually understand the implications of this going down? (How much money are they likely to lose if this app was down – better if we can quantify it)

All these and a slew of other such questions will arise and will have to be responded to. A mature organization will already have worked a lot of these into their standard procedures. But it is good for engineers at all levels of maturity and experience to think of these things. And build an internal methodology, keep refining it, re-tooling it to adapt to changes in the IT landscape. What is valid for an on-premise implementation might not be applicable for something on a public cloud.

Actually this topic gets even more murky when consider public IaaS cloud offerings. Those of us who have predominantly worked in the enterprise, often might find it hard to believe that people actually use public IaaS clouds, because, realistically an on-premise offering can provide better service levels and reliability than these measly virtual machines can provide. While the lure of a compute on demand, pay as you go model is very attractive, it is perhaps not applicable for every scenario.

So then you have adapt your methodology (if there is an acceptable policy to employ public IaaS clouds). More questions around deployments might now need to be prepended to our list above:

  1. How much compute will this application require?
  2. Will it need to be on a public cloud or a VM farm (private cloud) or on bare-metal hardware?

I used to think that almost everyone in this line of business thought about Systems Engineering on these lines. But I was very wrong. I have met and worked with some very smart engineers, who have some vague idea about these topics, or have thought through some aspects of these topics.

Many don’t take the effort to learn how to do a TCO calculation, or show an RoI for projects they are involved in (even to the extent of knowing for their own sakes). As “bean-counter-esque” these subjects might seem, they are very important (imho) towards taking your repertoire as a mature Engineer to the next level.

And Cost of ownership is often times neglected by Engineers. I’ve had heated discussions with my colleagues at times regarding why One technology was chosen/recommended over another.

One that comes to mind was when I was asked to evaluate and recommend one of the two technologies – Veritas Cluster Server and Veritas Storage Foundation vs Sun Cluster/ZFS/UFS combination. The shop at that time was a heavy user of Solaris + SPARC + Veritas Storage Foundation/HA.

The project was very simple – we wanted to reduce the physical footprint of our SPARC infrastructure (then comprised of V490s, V890s, V440s, V240s etc). So we chose to use a lightweight technology called Solaris Containers to achieve that. The apps were primarily DSS type, and batch oriented. We opted to use Sun’s  T5440 servers to virtualize (There were no T4s or T5s in those days, so we took the lumps on the slow single-thread performance in order not really upgrade the performance characteristics of the servers we were P2V’ing, but get a good consolidation ratio).

As a proof of concept, we baked off a pair of two-node cluster using Veritas Cluster Server 5.0 and Sun Cluster 3.2. Functionally, our needs were fairly simple. We needed to ascertain the following –

  • Can we use the cluster software to lash multiple physical nodes into a Zone/Container farm
  • How easy is it to set up, use and maintain
  • What would be our TCO and what would our return on investment be, choosing one technology over another

Both cluster stacks had agents to cluster and manage Solaris Containers and ZFS pools. At the end it boiled down to two things really –

  • The team was more familiar with VCS than Sun Cluster
  • VCS + Veritas Storage Foundation cost 2-3x more than Sun Cluster + ZFS combination

The cost of ownership was an overwhelmingly higher number. On the other hand, while the team (with the exception of myself) wasn’t trained in Sun Cluster, we  weren’t trying to do anything outrageous with the cluster software.

We would have a cluster group for each container we built, that comprised of the ZFS pools that housed the container’s OS + data volumes and a Zone agent that managed the container itself. The setup would then comprise of following steps:

  1. add storage to the cluster
  2. set up the ZFS pool for new container being instantiated
  3. install the container (a 10 minute task for a full root zone)
  4. create a cluster resource group for the container
  5. create a ZFS pool resource (called HAStoragePlus resource in Sun Cluster parlance)
  6. Crete a HA-Zone resource (to manage the zone)
  7. bring the cluster resource group online/enable cluster-based monitoring

These literally require 9-10 commands on the command line. Lather, rinse, repeat. So, when I defended the design, I had all this information as well as the financial data to support why I recommended using Sun Cluster over Veritas. Initially a few colleagues were resistant to the suggestion, over concerns about supporting the solution. But it was a matter of spending 2-3x more up-front and continually for support vs spending $40K on training over one year.

Every organization is different, some have pressures of reducing operational costs, while others reducing capital costs. At the end, the decision makers need to make their decisions based on what’s right for their organization. But providing this degree of detail as to why a solution was recommended helps to eliminate hurdles with greater ease. Unsound arguments usually fall apart when faced with cold, hard data.

Jan 092014
 

Tactical vs Strategic thinking

Let me attempt to delineate the two, in context of IT and specifically Infrastructure (because without that, these are just buzz-words that are being tossed around).

Tactical – This calls for quick thinking, quick-fixing type of mentality — for e.g., when we need to fix things that are broken quickly, or how to avert an impending disaster just this one time – essentially bandaids, repairs, and so on.

Strategic – This calls for vision, of looking at the complete picture and devising multiple responses (each as a failsafe for another possibly) to address as many things as possible that might affect your infrastructure and operations.

If any of you are chess players, you will know what I mean. A good chess player will evaluate the board, think ahead 5-7 moves, factoring in the various moves/counter-moves possible and then take a step. That is strategic thinking. A tactical player will play the game a move at a time (or at best a couple of moves ahead). He might save his piece for that moment (i.e. avert immediate damage), but he will most likely set stage for defeat, if he is playing a strategic opponent.

Some of the readers might be thinking that this is all great for talking, but how does this translate to the role of a Systems Engineer/Administrator? Surely, managing IT infrastructure is not a game of chess?

To that, my response is – But it is!

We, by virtue of our trade have taken on a big responsibility. We provision, manage and service the infrastructure that our respective organizations depend on (and perhaps a huge part of the organization’s revenue stream depends on it). So, it is our job (as individuals as well as a collective within each organization) to be able to foresee potential pitfalls, constantly evaluate what we are doing well and what we aren’t (be it process related, or technology related) and devise plans to address any issues that might crop up.

In some organizations, the management team would like to consider the “strategic” activities to be their role. But the fact is, good managers will always take the counsel of their teams in order to make successful and strategic decisions. Autocracy is not very conducive to successful decision making. Now, while I’m no management guru, I have worked in this industry for close to two decades and my opinions are based on my observations (And I will be happy to stand and defend my informal thesis with any Management Guru that would like to discuss this with me).

Moreover, most individuals in the management space have outdated knowledge of technology. They more than likely aren’t experts in any of the fields that their team supports and even if they had been experts in the past, they likely rely on vendors and VARs to  fill their knowledge gap. Now, we all know how reliable VARs and vendors can be to provide unbiased knowledge that will help with sound decision making.

But if you ask me that is not a weakness, but a strength. Managers (with a few exceptions) who are too close to technology tend to get bogged down in the minutiae of things, because they usually only look at it from the outside (as supervisors and not technicians). A good manager will necessarily include his/her team (or at least a few key individuals from the team) in their decision making process.

  • A manager I’ve had great pleasure of working with would take his team out every year for a 2-3 day planning session at the beginning of the year.
  • He would provide guidance of what the organizational goals were for the fiscal year (and what the business strategies were)
  • and then the team would brainstorm on what we would need to do to address the various components of infrastructure that we managed.
  • We would return back to the office at the end of the session with maybe 120-150 items on the board, categorized as small projects, large projects or tactical initiatives, divvy them up between the team members based on skills and interests
  • We would then, together with our manager, align these projects and tactical initiatives with the organizational goals and try and show how they help us help the organization, or help make a business case for cost savings, or increased efficiencies, and so on.
  • These would not only help us design our annual goals and objectives into measurable items, but also help the manager showcase what his team was doing to his manager(s) (and so on, up the food chain).

Strategic thinking is an art, but it is also a philosophical inclination, if you ask me. If we always assess a situation from not just the short-term effects in perspective, but look at what the cause is; evaluate whether it is an aberration or due to some flaw in process or whatever the reason might be. We can then, over time internalize this type of thinking, so that we can quickly look into the future and see what the implications of an event may be, and thereby take corrective actions or devise solutions to address this.

While it might seem that one-off solutions, workarounds and break-fixes are tactical, it is possible to have a strategy around using tactical solutions too. When you orchestrate a series of rules around where tactical thinking is applicable, you are in essence setting the stage for an over-arching strategy towards some end.

This is precisely what good chess players do – they strategize, practice, practice, practice. Then they do some more. Until they can recognize patterns and identify “the shape of things to come” almost effortlessly.

We, as IT professionals need to be able to discern patterns that affect our professional lives. These are patterns that are process-oriented, social or technological.

We have tools to look at the technological aspect – collect massive amounts of logs, run it through a tool such as Splunk or something similar, that will let us visualize the patterns easily.

The other two are harder – the hardest being social. To be aware of things as they transpire around us, in general, to be aware.

If you are interested, check this book out – The Art of Learning, by a brilliant chess player – Josh Waitzkin (the book and subsequent movie made on his life titled “Searching for Bobby Fischer”). The book of course doesn’t have anything to do with IT or infrastructure (it is about learning, chess and Tai Chi). But the underlying principles of the book resonated deeply with me in my quest to try and articulate my particular and peculiar philosophy that drives both my personal as well as my professional life.

As intelligent beings, we have the innate ability to look at and recognize patterns. The great 16th century Japanese swordsman – Miyamoto Musashi wrote in his famous book titled “Go Rin No Sho – The Book of Five Rings”  about how Rhythms are pervasive in everything (and while his overt focus was on Swordsmanship, there was a generic message in there as well) and how a good swordsman needs to recognize these rhythms (the rhythm of the battle, the rhythm of his opponent, and so on).

So I leave you with a quote from the Go Rin No Sho –

There is rhythm in everything, however, the rhythm in strategy, in particular, cannot be mastered without a great deal of hard practice.

Among the rhythms readily noticeable in our lives are the exquisite rhythms in dancing and accomplished pipe or string playing. Timing and rhythm are also involved in the military arts, shooting bows and guns, and riding horses. In all skills and abilities there is timing.

There is also rhythm in the Void.

There is a rhythm in the whole life of the warrior, in his thriving and declining, in his harmony and discord. Similarly, there is a rhythm in the Way of the merchant, of becoming rich and of losing one’s fortune, in the rise and fall of capital. All things entail rising and falling rhythm. You must be able to discern this. In strategy there are various considerations. From the outset you must attune to your opponent, then you must learn to disconcert him. It is crucial to know the applicable rhythm and the inapplicable rhythm, and from among the large and small rhythms and the fast and slow rhythms find the relevant rhythm. You must see the rhythm of distance, and the rhythm of reversal. This is the main thing in strategy. It is especially important to understand the rhythms of reversal; otherwise your strategy will be unreliable.

Jan 062014
 

I’ve just recently revived my interest in CMS (Content Management Systems) – specifically Joomla!

Why? Because I created this “portal” (http://www.medhajournal.com) and though I seriously doubt I have the time to actively maintain it, I’ve spent way too many hours working on it, to just abandon it on the wayside.

Medha Journal has evolved since it’s inception in 2007 from a Joomla 1.x version to Joomla 1.5.x, with a large hiatus in the middle (from 2010 through the last week of december 2013), until I just upgraded it (last week) from it’s rickety 1.5 version to the latest stable 2.5x version of Joomla.

Those who don’t know what joomla is, here’s a summary –

It is a PHP based open source Content Management System that can be extended to do more or less anything, using a massive collective Extensions library (sort of like CPAN), that is community driven.

So, to get back to the topic at hand. Since I had not dabbled with Joomla (besides the user-land/power-user oriented work – primarily content editorial stuff) in a long time, I decided to take the plunge with some down time in hand (the holidays) and upgrade it from 1.5 to 2.5.

In course of this migration, I also switched from the default Joomla content management to using a tool called K2 (which realistically is more suited from social media oriented content portals, such as Medha Journal).

One major issue I ran into was this –

The Old version of the website used an extension called “jomcomment” which was developed by some folks down in Malaysia or Indonesia, and allowed from some better spam control. I used this in conjunction with the myblog extension, developed by the same group of developers, to give my users a seamless (or relatively seamless) experience as they added content and commented on each other’s works.

However, with the newer versions of Joomla that currently are active (2.5x and 3.x), these extensions don’t work. Over the years, we had accumulated thousands of comments on the north of 2000 articles collected on the Journal. So, it was imperative to migrate these.

K2 (http://getk2.org) has a very nice commenting system with builtin Akismet spam filters, etc. So, the new site would obviously use this. Migration was an interesting proposition since the table structure (jomcomment to K2 comments) were not identical.

Old table looked like this:

 

Field Type Null Key Default Extra
id int(10) NO PRI NULL auto_increment
parentid int(10) NO 0
status int(10) NO 0
contentid int(10) NO MUL 0
ip varchar(15) NO
name varchar(200) YES NULL
title varchar(200) NO
comment text NO NULL
preview text NO NULL
date datetime NO 0000-00-00 00:00:00
published tinyint(1) NO MUL 0
ordering int(11) NO 0
email varchar(100) NO
website varchar(100) NO
updateme smallint(5) unsigned NO 0
custom1 varchar(200) NO
custom2 varchar(200) NO
custom3 varchar(200) NO
custom4 varchar(200) NO
custom5 varchar(200) NO
star tinyint(3) unsigned NO 0
user_id int(10) unsigned NO 0
option varchar(50) NO MUL com_content
voted smallint(6) NO 0
referer text NO NULL

The new table looks like this:

Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
itemID int(11) NO MUL NULL
userID int(11) NO MUL NULL
userName varchar(255) NO NULL
commentDate datetime NO NULL
commentText text NO NULL
commentEmail varchar(255) NO NULL
commentURL varchar(255) NO NULL
published int(11) NO MUL 0

This called for some manipulation of the data extracted from the old jomcomment tables, before they got inserted back into the K2 comments table in the new database (both were MySQL).

So, I imported the data out of the old table using phpMyAdmin, massaged the data, modified it to fit the new table structure and imported it back in.

The hardest part was ensuring that the date field of the extracted CSV imported into the new table correctly. For that, I had to manually adjust the date format. One way is as shown here.

Another way is (if your data fits in Excel), manually set the column format (corresponding to your date field) to something that matches the MySQL format (in this case, yyyy-mm-dd hh:mm).

 

Jan 062014
 

In a past life when I used to work for a wireless service provider,  they used a vended application to evaluate how much data bandwidth customers were consuming and that data was sent to the billing system to form the customers’ monthly bills.

The app was a poorly written (imho) and was woefully single-threaded, incapable of leveraging oodles of compute resources that were provided in the form of the twelve two-node VCS/RHEL based clusters. There were two data centers and there was one such “cluster of clusters” at each site.

The physical infrastructure was pretty standard and over-engineered to a certain degree (the infrastructure under consideration having been built in the 2007-2008 timeframe) – DL580 G5 HP servers  with 128GB of RAM each, a pair of gigabit ethernet nics for cluster interconnects, another pair bonded together as public interfaces and 4 x 4GB FC HBAs through Brocade DCX core switches (dual fabric) to an almost dedicated EMC CLaRiiON C4-960.

The application was basically a bunch of processes that watched traffic as it flowed through the network cores and calculated bandwidth usage based on end-user handset IP addresses (i’m watering it down to help keep the narrative fast and fluid).

Each location (separated by about 200+ miles) acted as the fault tolerant component for the other. So, the traffic was cross-fed across the two data centers over a pair of OC12 links.

The application was a series of processes/services that formed a queue across the various machines of each cluster (over TCP ports). The processes within a single physical system too communicated via IP addresses and TCP ports.

The problem we started observing over a few months from the point of deployment was that the data would start queuing up and slow down/skew the calculations of the data usage, implications of which I don’t necessarily have to spell out (ridiculous).

The software  vendor’s standard response would be –

  1. It is not an application problem
  2. The various components of the application would write into FIFO queues on SAN-based filesystems. The vendor would constantly raise the bogey of SAN storage being slow, not enough IOPs being available and/or response time being poor. Their basis of coming up with was with what seemed to be an arbitrary metric of CPU IO Wait percentage rising over 5% (or perhaps even lower at times).
  3. After much deliberation poring over the NAR reports of almost dedicated EMC CX4-960s (and working with EMC), we were able to ascertain that the Storage arrays or the SAN were not in any way contributory towards any latency (that resulted in poor performance of this app).
  4. The processes being woefully single threaded, would barely ever use more than 15%of total CPUs available on the servers (each server having 16 cores at it’s disposal).
  5. Memory usage was nominal and well within acceptable limits
  6. The network throughput wasn’t anywhere near saturation

We decided to start profiling the application (despite great protestations from the vendor) during normal as well as problematic periods, at the layer where we were seeing the issues, as well as the immediately upstream and downstream layers.

What we observed was that:

  1. Under normal circumstances, the app would be spending most of it’s time in either read, write, send or recv syscalls
  2. When app was performing poorly, it would spend most of it’s time in the poll syscall. It became apparent that it was waiting on TCP sockets from app instances in the remote site (and issue was bidirectional).

 

Once this bit of information was carefully vetted out and provided to the vendor, then they decided to provide us with the minimal throughput requirements – they needed a minimum of 30mbps throughput. The assumption made was that on an OC12 (bw of 622mbps), 30mbps was quite achievable.

However, it so turns out that latency plays a huge role in the actual throughput on a WAN connection! (*sarcasm alert*)

The average RTT between the two sites was 30ms. Given that the servers were running RHEL 4.6, for the default TCP send and recv buffer sizes of 64K (untuned), the 30ms RTT resulted in a max throughput of about 17mbps.

It turns out that methodologies we (*NIX admins) would use to generate some network traffic and measure “throughput” with, aren’t necessarily equipped to handle wide area networks. For example, SSH has an artificial bottleneck in it’s code, that throttles the TCP window size to a default of 64K (in the version of OpenSSH we were using at that time) – as hardcoded in the channels.h file. Initial tests were indeed baffling, since we would never cross a 22mbps throughput on the WAN. After a little research we realized that all the default TCP window sizes (for passive ftp, scp, etc) were not really tuned for high RTT connections.

Thus begun the process of tweaking the buffer sizes, and generating synthetic loads using iperf.

After we established that the default TCP buffer sizes were inadequate, we calculated buffer sizes required to provide at least a 80mbps throughput, and implemented then across the environment. The queuing stopped immediately after.

Jan 022014
 

This is a continuation of my previous post on this topic.

First, a disclaimer –

  • I have focused on CentOS primarily. I will try and update this to accommodate more than one Linux distro (and even consider writing for Solaris-based implementations in the future).

Here’s the skeleton of the cloudera manager setup plugin:

import posixpath

from starcluster import threadpool
from starcluster import clustersetup
from starcluster.logger import log

class ClouderaMgr(clustersetup.ClusterSetup):

        def __init__(self,cdh_mgr_agentdir='/etc/cloudera-scm-agent'):
                self.cdh_mgr_agentdir = cdh_mgr_agentdir
                self.cdh_mgr_agent_conf = '/etc/cloudera-scm-agent/config.ini'
                self.cdh_mgr_repo_conf = '/etc/yum.repos.d/cloudera-cdh4.repo'
                self.cdh_mgr_repo_url = 'http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/cloudera-manager.repo'
                self._pool = None

        @property
        def pool(self):
                if self._pool is None:
                        self._pool = threadpool.get_thread_pool(20, disable_threads=False)
                return self._pool

        def _install_cdhmgr_repo(self,node):
                node.ssh.execute('wget %s' % self.cdh_mgr_repo_url)
                node.ssh.execute('cat /root/cloudera-manager.repo >> %s' % self.cdh_mgr_repo_conf)

        def _install_cdhmgr_agent(self,node):
                node.ssh.execute('yum install -y cloudera-manager-agent')
                node.ssh.execute('yum install -y cloudera-manager-daemons')

        def _install_cdhmgr(self,master):
                master.ssh.execute('/sbin/service cloudera-scn-agent stop')
                master.ssh.execute('/sbin/chkconfig cloudera-scn-agent off')
                master.ssh.execute('/sbin/chkconfig hue off')
                master.ssh.execute('/sbin/chkconfig oozie off')
                master.ssh.execute('/sbin/chkconfig hadoop-httpfs off')
                master.ssh.execute('yum install -y cloudera-manager-server')
                master.ssh.execute('yum install -y cloudera-manager-server-db')
                master.ssh.execute('/sbin/service cloudera-scm-server-db start')
                master.ssh.execute('/sbin/service cloudera-scm-server start')

        def _setup_hadoop_user(self,node,user):
                node.ssh.execute('gpasswd -a %s hadoop' %user)

        def _install_agent_conf(self, node):
                node.ssh.execute('/bin/sed -e"s/server_host=localhost/server_host=master/g" self.cdh_mgr_agent_conf > /tmp/config.ini; mv /tmp/config.ini self.cdh_mgr_agent_conf')

        def _open_ports(self,master):
                ports = [7180,50070,50030]
                ec2 = master.ec2
                for group in master.cluster_groups:
                        for port in ports:
                                has_perm = ec2.has_permission(group, 'tcp', port, port, '0.0.0.0/0')
                                if not has_perm:
                                        ec2.conn.authorize_security_group(group_id=group.id,
                                                                        ip_protocol='tcp',
                                                                        from_port=port,
                                                                        to_port=port,
                                                                        cidr_ip='0.0.0.0/0')

        def run(self,nodes, master, user, user_shell, volumes):
                for node in nodes:
                     self._install_cdhmgr_repo(node)
                     self._install_cdhmgr_agent(node)
                     self._install_agent_conf(node)
                self._install_cdhmgr(master)
                self._open_ports(master)
        def on_add_node(self, node, nodes, master, user, user_shell, volumes):
                for node in nodes:
                     self._install_cdhmgr_repo(node)
                     self._install_cdhmgr_agent(node)
                     self._install_agent_conf(node)

And this plugin can be referenced and executed in the cluster configuration as follows:

[plugin cdhmgr]
setup_class = cdh_mgr.ClouderaMgr

[cluster testcluster]
keyname = testcluster
cluster_size = 2
cluster_user = sgeadmin
cluster_shell = bash
master_image_id = ami-232b034a
master_instance_type = t1.micro
node_image_id =  ami-232b034a
node_instance_type = t1.micro
plugins = wgetter,rpminstaller,repoinstaller,pkginstaller,cdhmgr

Once this step is successfully executed by StarCluster, you will be able to access the cloudera manager web GUI on port 7180 of your master node’s public IP and/or DNS entry.

<...more to follow...>

Dec 132013
 

I ran into an incredible tool known as StarCluster, which is an open-source project from MIT (http://star.mit.edu/cluster/).

StarCluster is built using Sun Microsystem’s N1 Grid Engine software (Sun used it to do deployment for HPC environments). And the folks at MIT developed on a fork of that (SGE – Sun Grid Engine) and StarCluster was formed.

StarCluster has hooks into the AWS API and is used to dynamically and automatically provision multi-node clusters in Amazon’s EC2 cloud. The base StarCluster software provisions a pre-selected virtual machine image (AMI) onto a pre-defined VM type (e.g.: t1.micro, m1.small, etc). The distributed StarCluster software has a hadoop plugin (http://star.mit.edu/cluster/docs/0.93.3/plugins/hadoop.html) which installs the generic Apache Foundation Hadoop stack on the cluster of nodes (or single node) provisioned thereof. It then uses a tool called dumbo (http://klbostee.github.io/dumbo/) to drive the Hadoop Framework. This is great.

But I want to leverage Cloudera’s CDH manager based Hadoop Distribution (CDH?) and want to roll this into the StarCluster framework. Therefore I have to start developing the necessary plugin to implement Cloudera Hadoop on top of StarCluster.

I won’t delve deeper into StarCluster’s inner-workings in this post, as I just started working on it a couple of days ago. I do intend to see how I can leverage this toolkit to add more functionality into it (like integrating Ganglia-based monitoring, etc) in due time.

Why StarCluster and not a Vagrant, Jenkins, Puppet combination?

Tools like StarCluster are meant to do rapid and effective deployments of distributed computing environments on a gigantic scale (N1 grid engine was used to deploy hundreds of physical servers back in the days when Sun still shone on the IT industry). While Vagrant, Puppet, Jenkins etc can be put together to build such a framework, it is in essence easier to use StarCluster (with less effort) to operationalize a Hadoop Cluster on EC2 (and after I spend some more time on the inner-workings, I hope to be able to use it work on other virtualization technologies too – perhaps integrate with an on-premise Xen or vmware or KVM cluster).

Let’s Skim the surface:

How StarCluster works

Let’s start with how StarCluster works (and this pre-supposes a working knowledge of Amazon’s AWS/EC2 infrastructure).

It is in essence a toolkit that you can install on your workstation/laptop and use it to drive configuration and deployment of your distributed clustering environment on EC2.

To set it up, follow the instructions on MIT’s website. It was pretty straightforward on my Mac Book Pro OS ver 10.9 (Mavericks). I already had the Xcode software installed on my Mac. So I cloned the StarCluster Git Distribution and ran through two commands that needed to be run as listed here — http://star.mit.edu/cluster/docs/latest/installation.html

After that, I set up my preliminary configuration file here (~/.starcluster/config) and we were ready to roll. I chose to test the star cluster functionality without and plugins etc by setting up a single node.

It’s important to choose the right AMI (because not all AMIs can be deployed on VM’s of your choice. For instance, I wanted to use the smallest, t1.micro instance to do my testing, since I won’t really be using this for any actual work. I settled on a CentOS 5.4 AMI from RealStack.

After verifying that my single node cluster was being setup correctly, I proceeded to develop a plugin to do Cloudera Hadoop initial setup along with a Cloudera Manager  installation on the Master node. As a result thereof, we can then roll out a cluster, connect to Cloudera Manager and use the Cloudera Manager interface to configure the Hadoop cluster. If further automation is possible with Cloudera Manager, I will work on that as an addendum to this effort.

The Plugins

Following the instructions here, I started building out the framework for my plugin. I called the foundational plugin centos (since my selected OS is centos). The CDH specific one will be called Cloudera.

Following directory structure is set up after StarCluster is installed –

In your home directory, a .starcluster directory, under which we have:

total 8
drwxr-xr-x   9 dwai  dlahiri   306 Dec 12 18:15 logs
drwxr-xr-x   4 dwai  dlahiri   136 Dec 13 10:01 plugins
-rw-r--r--   1 dwai  dlahiri  2409 Dec 13 10:01 config
drwxr-xr-x+ 65 dwai  dlahiri  2210 Dec 13 10:01 ..
drwxr-xr-x   5 dwai  dlahiri   170 Dec 13 10:01 .

Your new plugins will go into the plugins directory listed above (for now).

This is as simple as this (the plugin in essence is a python script – we call it centos.py):

from starcluster.clustersetup import ClusterSetup
from starcluster.logger import log

global repo_to_install
global pkg_to_install

class WgetPackages(ClusterSetup):
	def __init__(self,pkg_to_wget):
		self.pkg_to_wget = pkg_to_wget
		log.debug('pkg_to_wget = %s' % pkg_to_wget)
	def run(self, nodes, master, user, user_shell, volumes):
		for node in nodes:
			log.info("Wgetting %s on %s" % (self.pkg_to_wget, node.alias))
			node.ssh.execute('wget %s' % self.pkg_to_wget)

class RpmInstaller(ClusterSetup):
	def __init__(self,rpm_to_install):
		self.rpm_to_install = rpm_to_install
		log.debug('rpm_to_install = %s' % rpm_to_install)
	def run(self, nodes, master, user, user_shell, volumes):
		for node in nodes:
			log.info("Installing %s on %s" % (self.rpm_to_install, node.alias))
			node.ssh.execute('yum -y --nogpgcheck localinstall %s' %self.rpm_to_install)

class RepoConfigurator(ClusterSetup):
	def __init__(self,repo_to_install):
		self.repo_to_install  = repo_to_install
		log.debug('repo_to_install = %s' % repo_to_install)
	def run(self, nodes, master, user, user_shell, volumes):
		for node in nodes:
			log.info("Installing %s on %s" % (self.repo_to_install, node.alias))
			node.ssh.execute('rpm --import %s' % self.repo_to_install)

class PackageInstaller(ClusterSetup):
	def __init__(self,pkg_to_install):
		self.pkg_to_install  = pkg_to_install
		log.debug('pkg_to_install = %s' % pkg_to_install)
	def run(self, nodes, master, user, user_shell, volumes):
		for node in nodes:
			log.info("Installing %s on %s" % (self.pkg_to_install, node.alias))
			node.ssh.execute('yum -y install %s' % self.pkg_to_install)

Now we can reference the plugin in our configuration file:

[cluster testcluster]
keyname = testcluster
cluster_size = 1
cluster_user = sgeadmin
cluster_shell = bash
node_image_id = ami-0db22764
node_instance_type = t1.micro
plugins = wgetter,rpminstaller,repoinstaller,pkginstaller,sge
permissions = zookeeper.1, zookeeper.2, accumulo.monitor, hdfs, jobtracker, tablet_server, master_server, accumulo_logger, accumulo_tracer, datanode_data, datanode_metadata, tasktrackers, namenode_http_monitor, datanode_http_monitor, accumulo, accumulo_http_monitor

[plugin wgetter]
setup_class = centos.WgetPackages
pkg_to_wget = "http://archive.cloudera.com/cdh4/one-click-install/redhat/5/x86_64/cloudera-cdh-4-0.x86_64.rpm"

[plugin rpminstaller]
setup_class = centos.RpmInstaller
rpm_to_install = cloudera-cdh-4-0.x86_64.rpm

[plugin repoinstaller]
setup_class = centos.RepoConfigurator
repo_to_install = "http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera"

[plugin pkginstaller]
setup_class = centos.PackageInstaller
pkg_to_install = hadoop-0.20-mapreduce-jobtracker

This is not complete by any means – I intend to post the complete plugin + framework in a series of subsequent posts.