Jan 172014
 

This post is a continuation of my previous post. I thought I’d elaborate on the foundation I have built thus far…

If you are like most engineers, you probably get a kick out of working with technology; taking things apart, seeing how they work and (hopefully) putting them back together again, in one piece.

Okay, the last bit was unnecessary…most of us engineers have never taken things apart as kids, or even if we have, we’ve always put things back together again, as good as new (I jest).

What I’m trying to articulate here is that it takes a very special type of person to be an engineer. You probably aren’t going to be an engineer if you aren’t cut out for it (the training years are laborious enough to dissuade anyone but the most highly motivated individuals).

In other words, it is in our nature to try and find out how things work. Why they do certain things and what the purpose of these “things” are. The world of the Systems Engineer (IT) is a little more daunting than that of the regular engineer (say electrical or mechanical). That is because we have to deal with both hardware and software and try and integrate a multitude of such systems, towards a certain goal.

The purpose of this post is to try and focus on that goal. As good Systems Engineers, we need to understand why we are doing what we are doing. And as a corollary thereof, we also need to understand what the implications of what we are doing are (how do we impact the bottom line).

In many organizations, where the business is not primarily IT, and the culture has not evolved yet to acknowledge IT as a partner (i.e. consider IT to be a cost center as opposed to a revenue generator), this can be hard to articulate. But there are steps that can be taken towards that end. Which is perhaps a topic for another post.

But to the task at hand — let’s look at “Why are we doing what we are doing”:

To be really able to understand the objective of anything, we need to understand the reasons behind it. Some could be super obvious, like why you are putting together an ERP system for your organization (or a Supply Chain Management component, for instance). The ERP system is the “life blood” of your business.

In some other cases, it might be harder to understand. For instance, you are putting together infrastructure for a data warehouse, it would be a very good idea you understand what kind of analytics the business wants to run on this system. In fact, a lot of very useful information will emerge once you start asking questions to understand the role of the system.

For e.g.

  • What kind of data are they analyzing
  • How many times do they want to run their calculations
  • Do they want to run queries against the system at regular intervals
  • or do they want to be able to run queries against the system at any time, giving their users access to a front-end reporting tool
  • What are the systems that feed data to this system
  • What are the systems that the data from this system get fed to (if any)
  • What type of model do they want to employ – data marts feeding the warehouse or warehouse feeding the data marts (i’m probably watering it down a lot – here is a good reference .)
  • What is the frequency of data load (is it batch or realtime)?

All these questions and host of others will start and it’s a good idea to not assume anything. When in doubt, ask questions. To repeat that old cliché – “There is no such thing as a dumb question”. Sometimes we are not responsible for designing these systems – it still is a good idea to ask those questions and learn.

Mature IT organizations will have high level design documents that cover the business processes supported by various components of IT and at a high level, how data flows between these components. If you don’t have access to this information, ask someone that does. If there is an Enterprise Architecture group in your organization, reach out to them (or do what I do, ask my manager to help me get that information).

If you haven’t tried to understand the “Big picture” before, you will be surprised as to how much a bunch of visio diagrams or power point presentations can tell you (I love pretty pictures – so go suck an egg, those who consider it below their dignity to suffer through powerpoint “death sessions”). Diagrams can tell us a lot more far more efficiently (a picture is worth a thousand words, as that old saying goes) than hundreds of paragraphs of text can.

Having presented that, I would like to bring focus back to strategic thinking now. According to me, it is both an art form as well as a type of particular ideology to adopt. That said, let me present some thoughts about things I consider absolutely paramount towards developing Strategic thinking:

  1. Learn the Big Picture
  2. Learn to separate things that are purely tactical from things that can be strategic (or can be addressed strategically)
  3. Identify engineers (whether in your group or peer groups) who can help you learn and improve your repertoire — develop a rapport with them. I’ve always found it helpful to identify who the “go-to” people were in any organization I’ve worked in. They can help in filling knowledge gaps and as partners to brain-storm with as the need arises.
  4. Always have the long term in mind (this calls for awareness of how technology is evolving – reading lots of material online helps, so does attending conferences and seminars) – may be, a starting point would be to map the medium term to the depreciation cycle of your organization. A long term could then be 2 depreciation cycles.
  5. Stay flexible and adaptive – don’t get locked into a particular technology or decision. Keep and open mind. Technology evolves rapidly (far more rapidly than a 3 or 5-year depreciation cycle, for instance). So, while we can’t expect to be at the cutting edge of technological development at all times, it is good to keep an eye on where the world is going to be, when we finally pull around that corner.
  6. Don’t ever accept a narrative that limits your domain (in other words – don’t let anyone else dictate what you can or cannot think of, wrt technology). Nothing is sacrosanct :-)
  7. Get into the habit of collecting data
  8. Develop a set of tools that you can use to access data and analyze rapidly (a tool like splunk is very powerful and multi-faceted. Learn how to use it and simplify your life). Learning how to use SQL with a database (mysql maybe), how to use pivot tables in MS Excel perhaps. They are all valuable tools for data analysis.

NOTE: The list doesn’t necessarily need to be considered in the order presented above.

 

Jan 102014
 

This is in continuation of my previous post regarding Tactical vs Strategic thinking. I received some feedback about the previous post and thought I’d articulate the gist of the feedback and address the concern further.

The feedback – It seems like the previous post was too generic for some readers. The suggestion was to provide examples of what I consider “classic” tactical vs strategic thinking.

I can think of a few off the top, and please forgive me for using/re-using clichés or being hopelessly reductionist (but they are so ripe for the taking…).

I’ve seen several application “administrators” and DBAs refusing to automate startup and shutdown of their application via init scripts and/or SMF (in case of Solaris systems).

Really? An SAP basis application cannot be automatically started up? Or a Weblogic instance?

Why?

Because, *sigh and a long pause*…well because “bad things might happen!”

You know I’m not exaggerating about this. We all have faced such responses.

To get to the point, they were hard-pressed for valid answers. Sometimes we just accept their position, filed under the category “it’s their headache”.  That aside, a “strategic” thing to do, in the interest of increased efficiency and better delivery of services would be to automate the startup and shutdown of these apps.

An even more strategic thing to do would be to evaluate whether the applications would benefit from some form of clustering, and then automate the startup/shutdown of the app, building resource groups for various tiers of the application and establishing affinities between various groups.

For example, a 3-tier web UI based app would have an RDBMS backend, a business logic middle tier and a web front end. Build three resource/service groups using a standard clustering software and devise affinities between the three, such that the web ui is dependent on the business layer and the business layer dependent on the DB. Now, it’s a different case if all three tiers are scalable (aka active/active in the true sense).

An even more strategic approach would be to establish rules of thumb regarding use of cluster ware, or highly available virtual machines. Establishing service level tiers, that deliver different levels of availability (I’ve based it off the service level provided by vendors in the past).

This would call for standardization of the infrastructure architecture and as a corollary thereof, associated considerations would have to be addressed.

For eg:

  1. What do you need to provide a specific tier of service?
  2. Do you have the budget to meet such requirements (it’s significantly more expensive to provide a 99.99% planned uptime as opposed to 99%)?
  3. Do the business owners consider the app worthy of such a tier?
  4. If not, are you sure that the business owners actually understand the implications of this going down? (How much money are they likely to lose if this app was down – better if we can quantify it)

All these and a slew of other such questions will arise and will have to be responded to. A mature organization will already have worked a lot of these into their standard procedures. But it is good for engineers at all levels of maturity and experience to think of these things. And build an internal methodology, keep refining it, re-tooling it to adapt to changes in the IT landscape. What is valid for an on-premise implementation might not be applicable for something on a public cloud.

Actually this topic gets even more murky when consider public IaaS cloud offerings. Those of us who have predominantly worked in the enterprise, often might find it hard to believe that people actually use public IaaS clouds, because, realistically an on-premise offering can provide better service levels and reliability than these measly virtual machines can provide. While the lure of a compute on demand, pay as you go model is very attractive, it is perhaps not applicable for every scenario.

So then you have adapt your methodology (if there is an acceptable policy to employ public IaaS clouds). More questions around deployments might now need to be prepended to our list above:

  1. How much compute will this application require?
  2. Will it need to be on a public cloud or a VM farm (private cloud) or on bare-metal hardware?

I used to think that almost everyone in this line of business thought about Systems Engineering on these lines. But I was very wrong. I have met and worked with some very smart engineers, who have some vague idea about these topics, or have thought through some aspects of these topics.

Many don’t take the effort to learn how to do a TCO calculation, or show an RoI for projects they are involved in (even to the extent of knowing for their own sakes). As “bean-counter-esque” these subjects might seem, they are very important (imho) towards taking your repertoire as a mature Engineer to the next level.

And Cost of ownership is often times neglected by Engineers. I’ve had heated discussions with my colleagues at times regarding why One technology was chosen/recommended over another.

One that comes to mind was when I was asked to evaluate and recommend one of the two technologies – Veritas Cluster Server and Veritas Storage Foundation vs Sun Cluster/ZFS/UFS combination. The shop at that time was a heavy user of Solaris + SPARC + Veritas Storage Foundation/HA.

The project was very simple – we wanted to reduce the physical footprint of our SPARC infrastructure (then comprised of V490s, V890s, V440s, V240s etc). So we chose to use a lightweight technology called Solaris Containers to achieve that. The apps were primarily DSS type, and batch oriented. We opted to use Sun’s  T5440 servers to virtualize (There were no T4s or T5s in those days, so we took the lumps on the slow single-thread performance in order not really upgrade the performance characteristics of the servers we were P2V’ing, but get a good consolidation ratio).

As a proof of concept, we baked off a pair of two-node cluster using Veritas Cluster Server 5.0 and Sun Cluster 3.2. Functionally, our needs were fairly simple. We needed to ascertain the following –

  • Can we use the cluster software to lash multiple physical nodes into a Zone/Container farm
  • How easy is it to set up, use and maintain
  • What would be our TCO and what would our return on investment be, choosing one technology over another

Both cluster stacks had agents to cluster and manage Solaris Containers and ZFS pools. At the end it boiled down to two things really –

  • The team was more familiar with VCS than Sun Cluster
  • VCS + Veritas Storage Foundation cost 2-3x more than Sun Cluster + ZFS combination

The cost of ownership was an overwhelmingly higher number. On the other hand, while the team (with the exception of myself) wasn’t trained in Sun Cluster, we  weren’t trying to do anything outrageous with the cluster software.

We would have a cluster group for each container we built, that comprised of the ZFS pools that housed the container’s OS + data volumes and a Zone agent that managed the container itself. The setup would then comprise of following steps:

  1. add storage to the cluster
  2. set up the ZFS pool for new container being instantiated
  3. install the container (a 10 minute task for a full root zone)
  4. create a cluster resource group for the container
  5. create a ZFS pool resource (called HAStoragePlus resource in Sun Cluster parlance)
  6. Crete a HA-Zone resource (to manage the zone)
  7. bring the cluster resource group online/enable cluster-based monitoring

These literally require 9-10 commands on the command line. Lather, rinse, repeat. So, when I defended the design, I had all this information as well as the financial data to support why I recommended using Sun Cluster over Veritas. Initially a few colleagues were resistant to the suggestion, over concerns about supporting the solution. But it was a matter of spending 2-3x more up-front and continually for support vs spending $40K on training over one year.

Every organization is different, some have pressures of reducing operational costs, while others reducing capital costs. At the end, the decision makers need to make their decisions based on what’s right for their organization. But providing this degree of detail as to why a solution was recommended helps to eliminate hurdles with greater ease. Unsound arguments usually fall apart when faced with cold, hard data.