Solaris 8 Zone — etude/BrandZ trial run

January 10th, 2008

Solaris8 BrandZ pre-requisites

Requires kernel patch 127111-05 (or latest version) for sparc. Find all dependencies and fulfill then (ie patch requirements).

# lsSUNWs8brandr  SUNWs8brandu  SUNWs8p2v# pwd/mypool/software/sol8p2v/s8ma-1_0-rr/Product# pkgadd -d .The following packages are available:  1  SUNWs8brandr     Solaris 8 Migration Assistant: solaris8 brand support (Root)                      (sparc) 11.10.0,REV=2007.10.08.16.51  2  SUNWs8brandu     Solaris 8 Migration Assistant: solaris8 brand support (Usr)                      (sparc) 11.10.0,REV=2007.10.08.16.51  3  SUNWs8p2v        Solaris 8 p2v Tool                      (sparc) 11.10.0,REV=2007.10.08.16.51Select package(s) you wish to process (or 'all' to processall packages). (default: all) [?,??,q]:

The SUNWs8brandr and SUNWs8brandu packages need to be added to the Solaris 10 Host OS (Global Zone).

Zone configuration

Then configure the Zone —

# zonecfg -z s8-zones8-zone: No such zone configuredUse 'create' to begin configuring a new zone.zonecfg:s8-zone> create -t SUNWsolaris8zonecfg:s8-zone> set zonepath=/mypool/zones/s8-zonezonecfg:s8-zone>zonecfg:s8-zone> set autoboot=truezonecfg:s8-zone> add netzonecfg:s8-zone:net> set address=192.168.99.100zonecfg:s8-zone:net> set physical=bge1zonecfg:s8-zone:net> endzonecfg:s8-zone> add fszonecfg:s8-zone:fs> set type=zfszonecfg:s8-zone:fs> set dir=/mypool/vol1zonecfg:s8-zone:fs> endspecial not specifiedzonecfg:s8-zone:fs> set special=share/zone/s8-zonezonecfg:s8-zone:fs> endzonecfg:s8-zone>zonecfg:sol8zone> add attrzonecfg:sol8zone:attr> set name=hostidzonecfg:sol8zone:attr> set type=stringzonecfg:sol8zone:attr> set value=8325f14dzonecfg:sol8zone:attr> endzonecfg:sol8zone> verifyzonecfg:sol8zone> commitzonecfg:sol8zone> exitdwailsun:$() # zonecfg -z sol8zone infozonename: sol8zonezonepath: /mypool/zones/sol8zonebrand: solaris8autoboot: falsebootargs:pool:limitpriv:scheduling-class:ip-type: sharedfs:        dir: /mypool/vol1        special: share/zone/sol8zone        raw not specified        type: zfs        options: []net:        address: 192.168.99.100        physical: bge1attr:        name: hostid        type: string        value: 8325f14ddwailsun:$() # zonecfg -z sol8zone info attrattr:        name: hostid        type: string        value: 8325f14ddwailsun:$() #

Install the zone

dwailsun:$() # zonecfg -z sol8zone export > /var/tmp/safe/sol8zone.configdwailsun:$(safe) # zoneadm -z s8-zone install -u -a /mypool/software/sol8p2v/solaris8-image.flarcould not verify fs /mypool/vol1: could not access zfs dataset 'share/zone/s8-zone'zoneadm: zone s8-zone failed to verifydwailsun:$(safe) # zfs listNAME                  USED  AVAIL  REFER  MOUNTPOINTmypool               3.75G  15.4G  39.3K  /mypoolmypool/software      3.22G  6.78G  3.22G  /mypool/softwaremypool/vol1          66.6K  5.00G  34.0K  /mypool/vol1mypool/vol1/s8-zone  32.6K  5.00G  32.6K  /mypool/vol1/s8-zonemypool/www            544M  3.47G   544M  /mypool/wwwmypool/zones         34.0K  5.00G  34.0K  /mypool/zonesdwailsun:$(safe) # zfs set mountpoint=legacy mypool/vol1/s8-zonedwailsun:$(safe) # zfs listNAME                  USED  AVAIL  REFER  MOUNTPOINTmypool               3.75G  15.4G  39.3K  /mypoolmypool/software      3.22G  6.78G  3.22G  /mypool/softwaremypool/vol1          65.3K  5.00G  32.6K  /mypool/vol1mypool/vol1/s8-zone  32.6K  5.00G  32.6K  legacymypool/www            544M  3.47G   544M  /mypool/wwwmypool/zones         34.0K  5.00G  34.0K  /mypool/zonesdwailsun:$(safe) # zoneadm -z s8-zone install -u -a /mypool/software/sol8p2v/solaris8-image.flar      Log File: /var/tmp/s8-zone.install.987.log        Source: /mypool/software/sol8p2v/solaris8-image.flar    Installing: This may take several minutes...Postprocessing: This may take several minutes...        Result: Installation completed successfully.      Log File: /mypool/zones/sol8zone/root/var/log/s8-zone.install.987.log

Solaris8 P2V

Run sol8-p2v —

dwailsun:$(safe) # /usr/lib/brand/solaris8/s8_p2v s8-zone[Fri Dec 28 12:36:01 PST 2007]         S20_apply_patches:  Unpacking patch:  109 147-44[Fri Dec 28 12:36:01 PST 2007]         S20_apply_patches: Installing patch:  109 147-44Checking installed patches...Patch 109147-44 has already been applied.See patchadd(1M) for instructions.Patchadd is terminating.[Fri Dec 28 12:36:09 PST 2007]         S20_apply_patches:  Unpacking patch:  111 023-03[Fri Dec 28 12:36:09 PST 2007]         S20_apply_patches: Installing patch:  111 023-03Checking installed patches...Patch 111023-03 has already been applied.See patchadd(1M) for instructions.Patchadd is terminating.[Fri Dec 28 12:36:11 PST 2007]         S20_apply_patches:  Unpacking patch:  111 431-01[Fri Dec 28 12:36:11 PST 2007]         S20_apply_patches: Installing patch:  111 431-01Checking installed patches...This patch is obsoleted by patch 108993-67 which has alreadybeen applied to this system.Patchadd is terminating.[Fri Dec 28 12:36:13 PST 2007]         S20_apply_patches:  Unpacking patch:  112 605-04[Fri Dec 28 12:36:13 PST 2007]         S20_apply_patches: Installing patch:  112 605-04Checking installed patches...This patch is obsoleted by patch 108993-67 which has alreadybeen applied to this system.Patchadd is terminating.[Fri Dec 28 12:36:15 PST 2007]         S20_apply_patches:  Unpacking patch:  112 050-04[Fri Dec 28 12:36:15 PST 2007]         S20_apply_patches: Installing patch:  112 050-04Checking installed patches...Patch 112050-04 has already been applied.See patchadd(1M) for instructions.Patchadd is terminating.[Fri Dec 28 12:36:17 PST 2007]         S20_apply_patches:  Unpacking patch:  109 221-01[Fri Dec 28 12:36:17 PST 2007]         S20_apply_patches: Installing patch:  109 221-01Checking installed patches...This patch is obsoleted by patch 109318-39 which has alreadybeen applied to this system.Patchadd is terminating.dwailsun:$(safe) #

dwailsun:$(safe) # zoneadm -z s8-zone bootdwailsun:$(safe) # zoneadm list -v  ID NAME             STATUS     PATH                           BRAND    IP   0 global           running    /                              native   shared   3 s8-zone          running    /mypool/zones/sol8zone         solaris8 shareddwailsun:$(safe) # zlogin -C s8-zone[Connected to zone 's8-zone' console]You did not enter a selection.What type of terminal are you using? 1) ANSI Standard CRT 2) DEC VT52 3) DEC VT100 4) Heathkit 19 5) Lear Siegler ADM31 6) PC Console 7) Sun Command Tool 8) Sun Workstation 9) Televideo 910 10) Televideo 925 11) Wyse Model 50 12) X Terminal Emulator (xterms) 13) OtherType the number of your choice and press Return: 12Configuring network interface addresses: bge1.RPC: Timed out

Then it goes through and does the sysidcfg bit…

System identification is completed.rebooting system due to change(s) in /etc/default/initDec 28 12:41:25 rpcbind: rpcbind terminating on signal.System identification is completed.[NOTICE: Zone rebooting]SunOS Release 5.8 Version Generic_Virtual 64-bitCopyright 1983-2000 Sun Microsystems, Inc.  All rights reservedHostname: sol8virtThe system is coming up.  Please wait.starting rpc services: rpcbind done.syslog service starting.Print services started.Dec 28 14:41:37 sol8virt sendmail[4102]: My unqualified host name (sol8virt) unknown; sleeping for retryThe system is ready.sol8virt console login:

# uname -aSunOS sol8virt 5.8 Generic_Virtual sun4u sparc SUNW,A70# exit[Connection to zone 's8-zone' pts/5 closed]dwailsun:$(safe) # uname -aSunOS dwailsun 5.10 Generic_127111-05 sun4u sparc SUNW,A70dwailsun:$(safe) # zlogin s8-zone[Connected to zone 's8-zone' pts/5]Last login: Fri Dec 28 14:43:35 on pts/5Sun Microsystems Inc.   SunOS 5.8       Generic Patch   February 2004# uname -aSunOS sol8virt 5.8 Generic_Virtual sun4u sparc SUNW,A70## cat /etc/release                       Solaris 8 2/04 s28s_hw4wos_05a SPARC           Copyright 2004 Sun Microsystems, Inc.  All Rights Reserved.                            Assembled 08 January 2004#

/!\ Think of a optimal battery of tests that can help us determine whether this virtualized solaris 8 is a viable platform for servers that cannot be migrated….

  • Adding packages — pkgadd works

# uname -aSunOS sol8virt 5.8 Generic_Virtual sun4u sparc SUNW,A70# pkginfo|grep -i smcapplication SMCgcc         gccapplication SMCliconv      libiconvapplication SMClintl       libintlapplication SMCosh471      opensshapplication SMCossl        opensslapplication SMCzlib        zlib

(!) Set up sshd after adding these packages, complete with start up scripts, sshd privsep user id in the system accounts files (passwd and shadow).

# /etc/init.d/sshd startCould not load host key: /usr/local/etc/ssh_host_keyCould not load host key: /usr/local/etc/ssh_host_dsa_keyDisabling protocol version 1. Could not load host key# ps -ef|grep sshd    root  5086  4609  0 15:18:13 ?        0:00 /usr/local/sbin/sshd

Installing Oracle 8i

Setting up Oracle 8i was a breeze. Simply dumped the 2 cds of Oracle 8i 64-bit installation media onto a solaris8 zone visible fileystem and ran the runInstaller with all defaults and the demo database (scott/tiger) getting created as the end step.

/!\ Make sure to copy the media to local disk when installing inside the zone. The reason being, even though the cdrom can be exported to the local zone from the Global zone this way —

add fsset dir=/mntset special=/cdromset type=lofsadd options roadd options nodevicesend

We would have issues ejecting and inserting new cdroms, etc.

dwailsun:$() # ssh oracle@sol8virtoracle@sol8virt's password:Last login: Thu Jan  3 11:27:25 2008 from 10.119.10.4Sun Microsystems Inc.   SunOS 5.8       Generic Patch   February 2004Sun Microsystems Inc.   SunOS 5.8       Generic Patch   February 2004$ ps -ef|grep ora  oracle 22608 22152  0 11:25:48 ?        0:00 ora_reco_brandz  oracle 22610 22152  0 11:25:48 ?        0:00 ora_snp0_brandz  oracle 22626 22152  0 11:26:55 ?        0:00 /export/shared/oracle/OraHome1/bin/tnslsnr LISTENER -inherit  oracle 22614 22152  0 11:25:48 ?        0:00 ora_snp2_brandz  oracle 22687 22685  0 11:56:04 ?        0:00 /usr/local/sbin/sshd -R  oracle 22695 22689  0 11:56:09 pts/6    0:00 grep ora  oracle 22689 22687  0 11:56:04 pts/6    0:00 -ksh  oracle 22604 22152  4 11:25:48 ?        1:04 ora_ckpt_brandz  oracle 22600 22152  0 11:25:48 ?        0:00 ora_dbw0_brandz  oracle 22598 22152  0 11:25:48 ?        0:00 ora_pmon_brandz  oracle 22620 22152  0 11:25:48 ?        0:00 ora_d000_brandz  oracle 22602 22152  0 11:25:48 ?        0:02 ora_lgwr_brandz  oracle 22618 22152  0 11:25:48 ?        0:00 ora_s000_brandz  oracle 22616 22152  0 11:25:48 ?        0:00 ora_snp3_brandz  oracle 22612 22152  0 11:25:48 ?        0:00 ora_snp1_brandz  oracle 22606 22152  0 11:25:48 ?        0:00 ora_smon_brandz$

Using a Wiki for quick Documentation

October 10th, 2007

A few thoughts on using new(er) technology to manage/maintain and collaborate docs and projects.

In my previous gig with a major Pharmaceutical company, I had taken on the onus of documenting the environment at the onset. That way I could learn the “whole picture” while creating good reference material. But the ordinary way of writing was both hard and cumbersome.

1) Old hacks and tricks in vi-edited text files on my workstation were hard to manage and were hard to distribute publicly.
2) Environment design, etc were pain-stakingly created with visio and took hours.
3) Word is a lousy product to use for technical documentation, imho.

So decided to try out a bunch of technologies. Set up two content management engines (both FOSS — Joomla! and Drupal) that were basically LAMP apps (Linux, Apache, MySQL, PHP) and were very powerful. Joomla! caught my eye due to the flexibility and out-of-the-box toolset it provided. Drupal was minimalistic and needed more effort to “mold”. I went on and created a couple of public-consumption web sites using Joomla! (http://www.medhajournal.com and http://www.invadingthesacred.com).

However, the most powerful tool that struck a optimistic chord in mine and the rest of the teams’ heart was a minimalist tool called Moinmoin (http://moinmoin.wikiwikiweb.de) that ran on a python-based engine, maintained the content without any RDBMS backing and was a breeze to setup and use.

With excellent markup tags, it has made documentation a breeze and we actually ended up creating a lot of docs using this tool and managing some multi-million dollar projects (infrastructure side) as well.

Of course there were the PM-drones who ran Project, GANTT charts, etc, and they have their own rightful place in the hierarchy of things in the PM framework. But to make a dreary task (that most admins fear) surprisingly easy, the Wiki was the perfect tool.

Thoughts on Virtualization

August 22nd, 2007

I am a regular of the ZDNet blog by Paul Murphy and thought I’d add to his thoughts on Virtualization and all the brouhaha that’s going on these days –

Virtualization? uh huh… by ZDNet’s Paul Murphy — Virtualization is popular because it was popular - and not because there’s a practical reason to do it.

The most interesting thing I discovered in the process of working on a “high-visibility” project (ERP solution) is that most mgt-types don’t understand what Virtualization has to offer. Someone high up (high-up enough I guess) decides that Virtualization is the answer to all evils that haunt a modern datacenter. The claims are that –

  1. Virtualization reduces server sprawl
  2. Virtualization reduces power and cooling footprints
  3. It empowers the IT support organization to be agile (read build more boxes fast) and really support a dynamic business (with lots of development type activities going on)
  4. It is a cure for many problems..blah blah

But when you look at what you’re saving on the standard UNIX platforms (except Sun), the costs amount to something exorbitant. I won’t name the vendor, but it charges for everything starting from it’s multi-pathing software to Resource Mgt software to Virtualization, and they charge by the core.

Soon you start thinking, does this really buy me the cost savings by reducing server-sprawl?
Then the vendor will say, “Why look at this as a consolidation platform? Why don’t you think about the flexibility you’ll get by using this model? Moving workloads around on the fly, etc?”

The problem with that is that Workload management (called SLOs I believe) calls for very detailed and in-depth recording of metrics (what kind of loads are generated by applications, starting by categorizing by application types, etc.

So you first identify the right kinds of metrics to track. The collect the data for a reasonable period of time (say 3-4 months). Then, only after munging all that data, is it possible to say with any authority that a certain amount of resources are required for a particular workload (and build a system that can manage those resource requirements on the fly). T

his entire process might take about 1 year (from start to finish) before being a viable option (some shops I’ve been in are better equipped to do this kind of measurements than others — depending on how “modern” the IT organization usually is — does it “REALLY” employ standards such as ITIL or not, etc).

I’d say that something like Sun’s container model on the Cool-threads servers would be more appropriate for all the above criteria. Consolidation, Resource management, flexibility, etc.

  • SRM has been free with Solaris since Solaris 9.
  • Solaris 10 has the virtualization pieces completely free.
  • The hardware is cheap(er than the competition’s for sure)

Setting up Veritas Cluster server

August 22nd, 2007

Install the VCS Packages after patching the server to appropriate/recommended Patch list.

VCS LICENSE KEY : !@$-@$%-(*&^-$%@-$%%-!

List of VCS Packages:

VRTSappqw VRTSvcs VRTSvcsqw
VRTScscm VRTSvcsag VRTSvcsw
VRTSgab VRTSvcsdc VRTSvlic
VRTSllt VRTSvcsmg VRTSweb
VRTSoraqw VRTSvcsmn VRTSperl VRTSvcsor

edit /etc/llthosts (on both servers - for a 2 node cluster)

0 hostd02
1 hostd03

edit /etc/llttab

set-node hostd03 #here the nodename will change with each host
set-cluster 54 #Set the appropriate cluster ID
link qfe1 /dev/qfe:1 - ether - - #heartbeat 1
link qfe5 /dev/qfe:5 - ether - - #heartbeat 2
link-lowpri qfe0 /dev/qfe:0 - ether - - #Low-pri heartbeat

Edit the /etc/gabtab file with

cat > /etc/gabtab <<EOGAB
gabconfig -c -n 2
EOGAB

#Here the number after the “-n” varies with the number of nodes in cluster

Edit the main.cf (/etc/VRTSvcs/conf/config) to match your reqs

##Only on the first/main server of the Cluster

##Start of main.cf##

include “types.cf”
include “OracleTypes.cf”

cluster OneBill_Prod (
UserNames = { admin = “cDRpdxPmHpzS.” }
Administrators = { admin }
CounterInterval = 5
)

system hostd02 (
)

system hostd03 (
)

group network_grp (
SystemList = { hostd02 = 0, hostd03 = 1 }
PrintTree = 0
Parallel = 1
AutoStartList = { hostd02, hostd03 }
)

NIC OneBillv1_nic (
Device = qfe0
NetworkType = ether
)

Phantom OneBillv1_phantom (
)

group oracle_grp (
SystemList = { hostd02 = 0, hostd03 = 1 }
PrintTree = 0
AutoStartList = { hostd02 }
)

DiskGroup orashrdg_dg (
DiskGroup = orashrdg
)

IP OneBillv1_vip (
Device = qfe0
Address = “112.64.90.54
NetMask = “255.255.255.0
IfconfigTwice = 1
)

Mount au1_mnt (
MountPoint = “/au1″
BlockDevice = “/dev/vx/dsk/orashrdg/au1″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount bu1_mnt (
MountPoint = “/bu1″
BlockDevice = “/dev/vx/dsk/orashrdg/bu1″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount u01_mnt (
MountPoint = “/au1″
BlockDevice = “/dev/vx/dsk/orashrdg/au1″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount bu1_mnt (
MountPoint = “/bu1″
BlockDevice = “/dev/vx/dsk/orashrdg/bu1″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount u01_mnt (
MountPoint = “/u01″
BlockDevice = “/dev/vx/dsk/orashrdg/u01″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount u02_mnt (
MountPoint = “/u02″
BlockDevice = “/dev/vx/dsk/orashrdg/u02″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount u03_mnt (
MountPoint = “/u03″
BlockDevice = “/dev/vx/dsk/orashrdg/u03″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount u04_mnt (
MountPoint = “/u04″
BlockDevice = “/dev/vx/dsk/orashrdg/u04″
BlockDevice = “/dev/vx/dsk/orashrdg/u04″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Mount u05_mnt (
MountPoint = “/u05″
BlockDevice = “/dev/vx/dsk/orashrdg/u05″
FSType = vxfs
MountOpt = rw
FsckOpt = “-y”
)

Proxy OneBillv1_proxy (
TargetResName = OneBillv1_nic
)

Volume au1_vol (
Volume = au1
DiskGroup = orashrdg
)

Volume bu1_vol (
Volume = bu1
DiskGroup = orashrdg
)

Volume u01_vol (
Volume = u01
DiskGroup = orashrdg
)

Volume u02_vol (
Volume = u02
DiskGroup = orashrdg
)

Volume u03_vol (
Volume = u03
DiskGroup = orashrdg
)

Volume u04_vol (
Volume = u04
DiskGroup = orashrdg
)

Volume u05_vol (
Volume = u05
DiskGroup = orashrdg
)

OneBillv1_vip requires OneBillv1_proxy
au1_mnt requires au1_vol
au1_mnt requires orashrdg_dg
bu1_mnt requires bu1_vol
bu1_vol requires orashrdg_dg
u01_mnt requires u01_vol
u01_vol requires orashrdg_dg
u02_mnt requires u02_vol
u02_vol requires orashrdg_dg
u03_mnt requires u03_vol
u03_vol requires orashrdg_dg
u04_mnt requires u04_vol
u04_vol requires orashrdg_dg
u05_mnt requires u05_vol
u05_vol requires orashrdg_dg

##End of main.cf##

Copy OracleTypes.cf, etc to the config directory

From /etc/VRTSvcs/conf/config run

opt/VRTSvcs/bin/hacf -verify .

###(Fix errors as you get them)

Setting up GAB and LLT

sbin/gabconfig -U
/sbin/lltconfig -U
/sbin/lltconfig -c
/sbin/gabconfig -c -n 2
/sbin/lltconfig -a list

##Make sure Filesystems (Shared Filesystems) are commented out of the /etc/vfstab file

#Make sure each node in the cluster has the host/IP information of every other in it’s local hosts file#

Reboot the servers, bringing up the main server/node up first

On each node of the cluster

  • /sbin/vxlicinst -k <KEY>
  • /opt/VRTSvcs/bin/hastop -local -force
  • /opt/VRTSvcs/bin/hastart

Create Mount points on all nodes for Shared Filesystems

for i in au1 bu1 u01 u02 u03 u04 u05
do
if [ ! -d $i ]; then
mkdir $i
fi
done

Test failovers by bringing down resources and checking the failover

Sun Cluster Cheat Sheet — 4

July 18th, 2007

Data Services in the Cluster

HAStoragePlus helps configure a local filesystem into a highly available one. It provides following capabilities:

  • additional filesystem checks
  • mounts and unmounts
  • enables Sun cluster to failover local file systems (to failover, local file system must reside on global dgs with affinity switchovers enabled)

Data Service Agent — is a specially written software that allows a data service in a cluster to operate properly.

Data Service Agent (or Agent) does the following to a standard application:

  • stop/start application
  • monitor faults
  • validate configuration
  • provides a registration information file that allows Sun Cluster to store all the info about the methods.

Sun Cluster 2.x runs Fault Monitoring components on failover node, and can initiate a takeover. On Cluster 3.x software, it is not allowed. Monitor can either monitor to restart or failover on primary (active host) node.

Failover resource groups:

Logical host resource — SUNW.Logicalhostname Data Storage Resource — SUNW.HAStoragePlus NFS resource — SUNW.nfs

Shutdown a resource group:

scswitch -F -g 

Turn on a resourec group:

scswitch -Z -g 

Switch a failover group over to another node:

scswitch -z -g  -h 

Restart a resource group:

scswitch -R -h  -g 

Evacuate all resources and rgs from a node:

scswitch -S -h node

Disable a res and it’s fault monitor:

scswitch -n -j 

Enable a resource and it’s fault monitor:

scswitch -e -j 

Clear the STOP_FAILED flag:

scswitch -c -j  -h  -f STOP_FAILED

How to add a diskgroup and voluem to Cluster configuration

1. Create the disk group and volume.

2. Register the local disk group with the cluster.

        root@aesnsra1:../ # scconf -a -D type=vxvm,name=patroldg2,nodelist=aesnsra2        root@aesnsra2:../ # scswitch -z -h aesnsra2 -D patroldg2

3. Create your file system.

4. Update /etc/vfstab to change ‘-’ boot options

  • example:

        /dev/vx/dsk/patroldg2/patroldg02 /dev/vx/rdsk/patroldg2/patroldg02 /patrol02 vxfs 3 no suid

5. Set up a resource group with a HAStoragePlus resource for local filesystem:

        root@aesnsra2:../ # scrgadm -a -g aescib1-hastp-rg -h aescib1        root@aesnsra2:../ # scrgadm -a -g aescib1-hastp-rg -j sapmntdg01-rs -t SUNW.HAStoragePlus -x FilesystemMountPoints=/sapmnt

6. Bring the resource group online which will mount the specified filesystem:

        root@aesnsra2:../ # scswitch -Z -g hastp-aesnsra2-rg

7. Enable resource

        root@aesnsra2:../# scswitch -e -j osdumps-dev-rs

Optional step:

8. reboot and test.

Fault monitor operations

Disable the fault monitor for a resource:

scswitch -n -M -j 

Enable the Fault monitor for a resource:

scswitch -e  -M -j 

scstat -g       #shows status of all resource groups

Using scrgadm to register and configure Data service software

eg:

scrgadm -a -t SUNW.nfsscrgadm -a -t SUNW.HAStoragePlusscrgadm -p

Create a fail over res:

scrgadm -a -f nfs-rg -h node1,node2 \-y Pathprefix=/global/nfs/admin

Add logical host name res to rg:

scrgadm -a -L -g nfs-rg -l clustername-nfs

Create a HAStoragePlus res:

scrgadm -a -j nfs-stor -g nfs-rg \-t SUNW.HAStoragePlus \-x FilesystemMountpoints=/global/nfs -x AffinityOn=True

Create SUNW.nfs resource:

scrgadm -a -j nfs-res -g nfs-rg \-t SUNW.nfs -y Resource_dependencies=nfs-stor

Print the various resource/resource group dependencies via scrgadm:

scrgadm -pvv|grep -i depend     #And then parse this output

Enable res and res monitors, manage rg and switch rg to online state:

scswitch -Z -f nfs-rgscstat -g

Show current RG configuration:

scrgadm -p[v[v]] [ -t resource_type_name ] [ -g resgrpname ] \[ -j resname ]

Resizing a VxVM/VxfS vol/fs under sun cluster

# vxassist -g aesnfsp growby saptrans 5g# scconf -c -D name=aesnfsp,syncroot@aesrva1:../ # vxprint -g aesnfsp -v saptransTY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0v  saptrans     fsgen        ENABLED  188743680 -       ACTIVE   -       -root@aesrva1:../ # fsadm -F vxfs -b 188743680 /saptransUX:vxfs fsadm: INFO: /dev/vx/rdsk/aesnfsp/saptrans is currently 178257920 sectors - size will be increased# root@aesrva1:../ # scconf -c -D name=aesnfsp,sync

Command Quick Reference

scstatscconfscrgadmscha_scdidadm

Sun Terminal Concentrator (Annex NTS)

Enable setup mode by pressing TC test button until TC power indicator starts to blink rapidly, then release the button and press it briefly.

On entering the Setup mode, a “monitor:” prompt is displayed.

Set up IP address using:

monitor::addr

Setting up Load source:

monitor::seq

Specifying image:

monitor::image

Telnet into the TC IP address:enter "cli"Elevate to privileged acct using "su"Run "admin" at the TC OS prompt:get admin: subprompt:show port=1 type modeset port= type  mode  #Choose various optionsquit (to exit the boot prompt)boot

Sun cluster Cheat Sheet — 3

July 18th, 2007

Displays existing DG resources in the Cluster

scstat -D

Registering VxVM DGs

scconf -a -D type=vxvm,name=. \nodelist=:, \preferenced=true,failback=enabled

  • nodelist should contain only nodes that are physically connected to the disks of that dg.
  • preferenced=true/false affects whether nodelist indiciates an order of failover preference. On a two-node cluster, this options is only meaningful if failback is enabled.
  • failback=disabled/enabled affects whether a preferred node “takes back” it’s device group when it joins the cluster. The default value is disabled. When faileback is disabled, preferenced is set to false. If it is enabled, preferenced also must be set to true.

Moving DGs across nodes of a cluster

When VxVM dgs are registered as Sun Cluster resources, NEVER USE vxdg import/deport commands to change ownership (node-wise) of the dgs. This will cause SC to treat dg as failed resource.

Use the following command instead:

# scswitch -z -D  -h 

Resyncing Device Groups

scconf -c -D name=,sync

Changing DG configuration

scconf -c -D name=,preferenced=
,failback=

Maintenance mode

scswitch -m -D 

NOTE: all volumes in the dg must be unopened or unmounted (not being used) in order to do that.

To come back out of maintenance mode

scswitch -z -D  -h 

Repairing DID device database after replacing JBOD disks

  • ‘Make sure you know which disk to update …’

scdidadm -l c1t1d0

returns node1:/dev/rdsk/c1t1d0 /dev/did/rdsk/d7

scdidadm -l d7

returns node1:/dev/rdsk/c1t1d0 /dev/did/rdsk/d7

Then use following cmds to update and verify the DID info:

scdidadm -R d7scdidadm -l -o diskid d7

returns a large string with disk id.

Replacing a failed disk in a A5200 Array (similar concept with other FC disk arrays)

vxdisk list - get the failed disk namevxprint -g dgname -- determine state of the volume(s) that might be affected

On the hosting node, replace the failed disk:

luxadm remove enclosure,positionluxadm insert enclosure,position

On either node of the cluster (that hosts the dg):

scdidadm -l c#t#d#scdidadm -R d#

On the hosting node:

vxdctl enablevxdiskadm (replace failed disk in vxvm)vxprint -g vxtask list     #ensure that resyncing is completed

Remove any relocated submirrors/plexes (if hot-relocation had to move something out of the way):

vxunreloc repaired-diskname

Solaris Vol Mgr (SDS) in Sun Clustered Env

Preferred method of using Soft partitions is to use single slices to create mirrors and then create volumes (soft partitions) from that (kind of similar to VxVM public region in an initialized disk).

Shared Disksets and Local Disksets

Only disks that are physically located in the multi-ported storage will be members of shared disksets. Only disks that are in the same diskset operate as a unit; they can be used together to build mirrored volumes, and primary ownership of the diskset transfers as a while from node to node.

Boot disks are the local disksets. This is a pre-requisite in order to have shared disksets.

Replica management

  • Add local replicas manually.
  • Put local state db replicas on slice 7 of disks (as a convention) in order to maintain uniformity. Shared disksets have to have replicas on slice 7.
  • Spread local replicas evenly across disks and controllers.
  • Support for Shared disksets is provided by Pkg SUNWmdm

Modifying /kernel/drv/md.conf

nmd == max num of volumes (default 128)md_nsets == max is 32, default 4.

Creating shared disksets and mediators

scdidadm -l c1t3d0

  • – returns d17 as DID device

scdidadm -l d17metaset -s  -a -h    # creates metasetmetaset -s  -a -m    # creates mediatormetaset -s  -s /dev/did/rdsk/d9 /dev/did/rdsk/d17metaset # returns valuesmetadb -s medstat -s  (reports mediator status)

Remaining syntax vis-a-vis Sun Cluster is identical to that for VxVM.

IPMP and sun cluster

IPMP is cluster un-aware. To work around that, Sun Cluster uses Cluster-specific public network mgr daemon (pnmd) to integrate IPMP into the cluster.

pmnd daemon has two capabilities:

  • populate CCR with public network adapter status
  • facilitate application failover

When pnmd detects all members of a local IPMP group have failed, it consults a file called /var/cluster/run/pnm_callbacks. This file contains entries that would have been created by the activation of Log icalHostname and SharedAddress resources. It is the job of hafoip_ipmp_callback to device whether to migrate resources to another node.

scstat -i       #view IPMP configuration

Sun Cluster Cheat Sheet — 2

July 18th, 2007

Sun Cluster Set up

  • don’t mix PCI and SBus SCSI devices

Quorum Device Rules

  • A quorum device must be available to both nodes in a 2-node cluster
  • QD info is maintained globally in the CCR db
  • QD should contain user data
  • Max and optimal number of votes contributed by QDs must be N -1
    • (where N == number of nodes in the cluster)
  • If # of QDs >= # of nodes, Cluster cannot come up easily if there are too

    • many failed/errored QDs.
  • QDs are not required in clusters with more than 2 nodes, but recommended
    • for higher cluster availability.
  • QDs are manually configured after Sun Cluster s/w installation is done.
  • QDs are configured using DID devices

Quorum Math and Consequences

  • A running cluster is always aware of (Math):
    • –> Total possible Q votes (number of nodes + disk quorum votes) –> Total present Q votes (number of booted nodes + available QD votes) –> Total needed Q votes ( >= 50% of possible votes)

    Consequences:

    • –> Node that cannot find adequate Q votes will freeze, waiting for

      • other nodes to join the cluster

      –> Node that is booted in the cluster but can no longer find the

      • needed number of votes kernel panics

installmode Flag — allows for cluster nodes to be rebooted after/during initial

  • installation without causing the other (active) node(s) to panic.

Cluster status

# Reporting the clsuter membership and quorum vote information

# /usr/cluster/bin/scstat -q

Verifying cluster configuration info

# scconf -p

Run scsetup to correct any configuration mistakes and/or to:

* add or remove quorum disks * add, remove, enable, disable cluster transport components * register/unregister vxVM dgs * add/remove node access from a VxVM dg * change clsuter private host names * change cluster name

Shutting down cluster on all nodes:

# scshutdown -y g 15

# scstat (verifies cluster status)

Cluster Daemons

lahirdx@aescib1:/home/../lahirdx > ps -ef|grep cluster|grep -v grep    root     4     0  0   May 07 ?       352:39 cluster    root   111     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/qd_userd    root   120     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/failfastd    root   123     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/clexecd    root   124   123  0   May 07 ?        0:00 /usr/cluster/lib/sc/clexecd    root  1183     1  0   May 07 ?       46:45 /usr/cluster/lib/sc/rgmd    root  1154     1  0   May 07 ?        0:07 /usr/cluster/lib/sc/rpc.fed    root  1125     1  0   May 07 ?       23:49 /usr/cluster/lib/sc/sparcv9/rpc.pmfd    root  1153     1  0   May 07 ?        0:03 /usr/cluster/lib/sc/cl_eventd    root  1152     1  0   May 07 ?        0:04 /usr/cluster/lib/sc/cl_eventlogd    root  1336     1  0   May 07 ?        2:17 /var/cluster/spm/bin/scguieventd -d    root  1174     1  0   May 07 ?        0:03 /usr/cluster/bin/pnmd    root  1330     1  0   May 07 ?        0:01 /usr/cluster/lib/sc/scdpmd    root  1339     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/cl_ccrad

  • FF Panic rule — failfast will shutdown the node (panic the kernel) if specified daemon is not restarted within 30s.
  • cluster — system proc created by the kernel to encap kernel threads that make up the core kernel range of operatiosn. It directly panics the kernel if it’s sent a KILL signal (SIGKILL). Other signals have no effect.
  • clexecd — this is used by cluster kernel threads to execure userland cmds (such as run_reserve and dofsck cmds). It is also used to run cluster cmds remotely (eg: scshutdown).A failfast driver panics the kernel if this daemon is killed and not restarted in 30s.
  • cl_eventd — This daemon registers and forwards cluster events s(eg: nodes entering and leaving the cluster). With a min of SC 3.1 10/03, user apps can register themselves to receive cluster events. The daemon automatically get’s respawned by rpc.pmfd if it is killed.
  • rgmd — This is the resource group mgr, which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed by not started in 30s.
  • rpc.fed — this is the “fork-and-exec” daemon - -which handles reqs from rgmd to spawn methods for specific data services. failfast will hose the box if this is killed and not restarted in 30s.
  • scguieventd — this daemon processes cluster events for the SunPlex Mgr GUI, so that the display can be updated in real time. It’s not automatically started if it stops. If you are having trouble with SunPlex Mgr, might have to restart the daemon or reboot the specific node.

  • rpc.pmfd — This is the process monitoring facility. It is i used as a general mech to initiate restarts and failure action scripts for some cluster f/w daemons, and for most app daemons and app fault monitors. FF panic rule holds good.
  • pnmd — This is the public Network mgt daemon, and manages n/w status info received from the local IPMP (in.mpathd) running on each node in the cluster. It is automatically restarted by rpc.pmfd if it dies.
  • scdpmd — multi-threaded DPM daemon runs on each node. DPM daemon is started by an rc script when a node boots. It montiors the availability of logical path that is visible thru various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.

Validating basic cluster config

  • The sccheck (/usr/cluster/bin/sccheck) cmd validates the cluster configuration:
  • /var/cluster/sccheck is the repository where it stores the reports generated.

Disk Path Monitoring

  • scdpm -p all:all #prints all disk paths in the clsuter and their status

    scinstall -pv #check the clsuter installation status — package revisions, patches applied, etc

  • Cluster release file: /etc/cluster/release

Shutting down cluster

  • scshutdown -y -g 30

Booting nodes in non-cluster mode
  •         boot -x    

Placing node in maintenance mode
  • scconf -c -q node=,maintstate

Reset the maintenance mode by rebooting the node or running
  • scconf -c -q reset By placing a node in a cluster in maintenance mode, we reduce the number of reqd. quorum votes and ensuring that cluster operation is not disrupted as a result thereof).

    Sunplex manager is available on https::3000

VxVM Rootdg requirements for Sun Cluster

* vxio major number has to be identical on all nodes of the cluster (check for vxio entry in /etc/name_to_major)

* vxvm installed on all nodes physically connected to shared storage — on non-storage nodes, yvxvm can be used to encapsulate and mirror the boot disk. If not using VxVM on a non-storage node, use SVM. All is required in such a case is the vxio major number be identical to all other nodes of the cluster (add an entry in /etc/name_to_major file).

* VxVM license is reqd. on all nodes not connected to a A5×00 storedge array.

* Std rootdg created on all nodes where vxVM is installed. Options to initialize rootdg on each node are:

  • –> Encap boot disk so it can be mirroered. Preserve all data and creating volumes inside rootdg to encap /global/.devices/node@# –> If disk has more than 5 slices on it, it cannot be encap’ed. –> Initialize other local disks into rootdg.

* Unique volume name and minor number across the nodes for the /global/.devices/node@# file system if the boot disk is encap’ed — the /global/.devices/node@# fs must be on devices with a unique name on each node, because it’s mounted on each node for the same reason. The normal Solaris OS /etc/mnttab logic redates global fs and still demands that each device have a unique major/minor number. VxVM doesn’t support changing minor numbers of individual volumes. The entire disk group has to be re-minored.

Use the following command:

#vxdg  [ -g diskgroup ] [ -f ]  reminor           [diskgroup ]  new-base-minor

From the vxdg man pages:

     reminor   Changes the base minor number for  a  disk  group,               and  renumbers  all devices in the disk group to a               range starting at that number.  If the device  for               a  volume  is  open,  then  the  old device number               remains in effect until the system is rebooted  or               until  the disk group is deported and re-imported.               Also, if you close an open volume, then  the  user               can   execute  vxdg reminor  again  to  cause  the               renumbering to take effect  without  rebooting  or               reimporting.               A new device number may also overlap with  a  tem-               porary  renumbering for a volume device. This also               requires a reboot or reimport for the  new  device               numbering to take effect.  A temporary renumbering               can happen in the following situations:  when  two               volumes  (for  example,  volumes  in two different               disk groups) share the same  permanently  assigned               device number, in which case one of the volumes is               renumbered temporarily to use an alternate  device               number; or when the persistent device number for a               volume was changed, but the active  device  number               could  not be changed to match.  The active number               may be left unchanged after  a  persistent  device               number change either because the volume device was               open, or because the new number was in use as  the               active device number for another volume.               vxdg fails if you try to use a  range  of  numbers               that  is  currently  in use as a persistent (not a               temporary) device number.  You can  force  use  of               the  number range with use of the -f option.  With               -f, some device renumberings may not  take  effect               until  a  reboot or a re-import (just as with open               volumes).  Also, if you force volumes in two  disk               groups  to use the same device number, then one of               the volumes is temporarily renumbered on the  next               reboot.   Which volume device is renumbered should               be considered random, except that  device  number-               ings in the rootdg disk group take precedence over               all others.               The -f option should be used  only  when  swapping               the  device number ranges used by two or more disk               groups.  To swap the number ranges  for  two  disk               groups,  you  would  use  -f  when renumbering the               first disk group to use the range  of  the  second               disk  group.  Renumbering the second disk group to               the first range does not require the use of -f.
  • Sun Cluster does not work with Veritas DMP. DMP can be disabled before installing the software by putting in dummy symlinks, etc.
  • scvxinstall — is a shell script that automates VxVM installation in a Sun Clustered env.
  • scvxinstall automates the following things:
    • tries to disable DMP (vxdmp)
    • installs correct cluster package
    • automatically negotiates a vxio major number and properly edits /etc/name_to_major
    • automates rootdg initialization process and encapsulates boot disk
      • –> gives different device names for the /global/.devices/node@# volumes on each side –> edites teh vfstab properly for this same volume. The problem is this particular line as DID device on it, and VxVM doesn’t understand DID devices. –> installs a script to “reminor” the rootdg on the reboot –> reboots the node so that VxVM operates properly.

Sun Cluster Cheat Sheet — 1

July 18th, 2007

Cluster Configuration Repository (CCR)

  • /etc/cluster/ccr (directory)

Important Files

  • /etc/cluster/ccr/infrastructure

Global Services

  • One node is to specific global services. All other nodes communicate with the global services (devices, filesystems) via the Cluster interconnect.

Global Naming (DID Devices)

  • - /dev/did/dsk and /dev/did/rdsk

  • DID used only for naming globally — not access
  • DID device names cannot/are not used in VxVM
  • DID device names are used in Sun/Solaris Volume Manager

Global Devices

  • provide global access to devices irrespective of there physical location.
  • Most commonly SDS/SVM/VxVM devices are used as global devices. LVM software is unaware of the implementation of global nature on these devices.

/global/.devices/node@nodeID

  • nodeID is an integer representing the node in the cluster

Global Filesystems

  • # mount -o global, logging /dev/vx/dsk/nfsdg/vol01 /global/nfs

    or edit the /etc/vfstab file to contain the following:

            /dev/vx/dsk/nfsdg/vol01    /dev/vx/rdsk/nfsdg/vol01    /global/nfs    ufs    2    yes    global,logging    

Global Filesystem is also known as (aka) Cluster Filesystem (CFS) or PxFS (Proxy File system)

NOTE: Local failover filesystems (ie. directly attached to a storage device) cannot be used for scalable services — one would have to use global filesystems for it.

Console Software

  • SUNWccon There are three wariants of the cluster console software:
    • cconsole ( access the node consoles through the TC or other remote console access method )
    • crlogin (uses rlogin as underlying transport)
    • ctelnet (uses telnet as underlying transport)

      /opt/SUNWcluster/bin/ &

Cluster Control Panel

/opt/SUNWcluster/bin/ccp [ clustername ] &

All necessary info for cluster admin is stored in the following two files:

  • –> /etc/clusters Eg: sc-cluster sc-node1 sc-node2

    –> /etc/serialports

        sc-node1 sc-tc 5002             # Connect via TCP port on TC        sc-node2 sc-tc 5003        sc-10knode1 sc10k-ssp 23        # connect via E10K SSP        sc-10knode2 sc10k-ssp 23        sc-15knode1 sf15k-mainsc 23     # Connect via 15K Main SC        e250node1 RSCIPnode1 23         # Connect via LAN RSC on a E250        node1 sc-tp-ws 23               # Connect via a tip launchpad        sf1_node1 sf1_mainsc 5001       # Connect via passthru on midframe

Playing with ideas for an Enterprise File Transfer mechanism

April 3rd, 2007


Overview

This is a post of an old high-level architecture design I’d worked on, to see how Open-source technology might fit into a Financial Organization’s (my then-employer’s client) Enterprise.

Long-term Goal

  • To design an Enterprise File Transfer mechanism based on high encryption/compression transport mechanism of Secure Shell (SSH).
  • Solution should be easy to use, easy to deploy and cost effective.
  • Solution should be scalable and robust.
  • Solution should integrate the two major OS platforms (UNIX and Windows) “seamlessly

Technical Solution Specs

Secure FTP solution for the Enterprise
Breakup of the requirements

  • Centralized “drop box” or “landing zone” type facility
  • Automated “feeds” type mechanism for further propagation/distribution of data/files
  • Should we try and integrate a transparent layer of version control that’ll maintain audit trails, etc?
  • Easy client access (preferably web-based)
  • Additional client access from Windows environment (fat clients as required)
  • Tight integration with existing authentication mechanism(s) – Active Directory + UNIX logins
  • Low learning curve (meaning, simple solution)
  • Fine-grained access control
  • Ease of administration (centralized user account + privilege management)
  • traceable usage history (auditable transfer logs, user access logs, user activity logs)
  • Easily replicable process of deployment/re-deployment

Possible Solutions (commercial and open-source)

  • Tumbleweed-based Secure transfer mechanism (already investigated and POC’ed)
  • Combination of Windows SSH/SCP/SFTP servers and clients and the pre-existing OpenSSH servers (on UNIX) with Jscape’s SFTP/FTPS applet based web front-end (opensource solution).
  • Vandyke Software’s Vshell software (which gives SSH servers + client command lines for both Windows and UNIX)
  • SSH Tectia software suite

Other Considerations

  • Inter-operability of different technologies
    • The biggest challenge in designing a comprehensive “seamless” Secure FTP solution for any enterprise is in the inter-operability between the disparate platforms. For example, the UNIX servers and the Windows servers need to be able to seamlessly communicate with each other in order for such a solution to be viable.
  • Integrated authentication mechanism
    • The UNIX systems can be made to authenticate against an LDAP server or the Microsoft Active Directory. With the introduction of either an LDAP_PAM module for LDAP based authentication or third-party plug-ins (such as Vintella’s VAS module) that will “hook up” with AD.
    • Keeping such an environment in mind, our solution should be designed to be able to automatically adapt to an eventual roll-out of such an authentication system.
    • This would, as a result, allow for easy, centralized management of user accounts and privileges.

Overview of the Open-technology solution

  • Web-based SFTP system
    • This solution is designed around existing OpenSSH server and client software currently running in the Enterprise (UNIX/UNIX-like platforms). Additional software components like Windows commercial SSH product(s) and Jscape Secure FTP applet will be required to realize this design.
    • Major components:
  • UNIX (Solaris/Linux platforms)
    • OpenSSH Server
    • Apache Web server
    • Jscape’s Secure FTP applet
    • HTML page to load the SFTP applet
  • Windows
    • Commercial SSH/SFTP server
    • IIS web server
    • Jscape’s Secure FTP applet
    • HTML page to load the SFTP applet

Application Client-side Requirements

Operating Systems (supported): Windows 98/2000/XP/ME, Linux, Solaris, Mac OS X
Browser: Internet Explorer or Netscape Navigator/Mozilla (gecko-based) browsers
Java VM: Java Plug-in 1.4.2 or higher enabled

NOTE: For Macintosh users, the MRJ (Mac Runtime for Java) does not include the necessary crypto classes required to establish a secure connection. If using MRJ, you will need to install the Sun JCE (Java Cryptography extensions) reference implementation.?

Strengths and Advantages

  • This solution leverages the existing SSH infrastructure in-house (OpenSSH on UNIX platform already exists in most shops or is available for free downloads) and a cost-effective OpenSource Java Applet based Web interface.
  • This is an extremely simple solution and with pre-determined “drop zone” servers in place at each location, using the mechanism of key-based authentication and command-line tools, administrators will be able to automate and schedule “feeds” transmissions to requested targets.

Additional Requirements

Design customizable scripting framework using Perl (and/or similar programming language) and XML that would allow for automated feeds to be implemented.

Using the Centralized SSH2-key based root access method to automate network inventory

March 9th, 2007

Leveraging Centralized SSH2 based trusts to monitor network interface status on solaris servers

Since SSH2 key-based trusts have been established in this landscape (at root level), the automation of a variety of tasks becomes easily achievable. The SSH2 key-based trust ensures secure and encrypted transport mechanism (that reinforces security-oriented approach to system administration). Leveraging tools such as sudo (1m) or powerbroker an additional layer of security and auditability can be added.

Using TLRC and ndd_get.sh to collect Network-related information

The following two scripts can be used to make network interface related metrics collections.

tlrc.pl (Test Login Run Command) is a perl script that reads input from a colon-separated text file (of very specific format) or from the command-line and can execute any command on the remote host(s) specified with STDOUT/STDERR logging, etc.

tlrc.pl (test login run command) –

#!/usr/bin/env perl

use Getopt::Std;
use Net::Ping;

my %Args;

getopts( ‘l:i:c:o:n:adT:th’, \%Args );

if ( $Args{h} ) {
&printUsage &amp;amp;& exit 0;
}

my $hlist = $Args{i} || “/path/to/inventory.txt”;
my $ssh = “/usr/bin/ssh”;

my $rsh = “/usr/bin/rsh”;
my $p = Net::Ping->new();
my $lid = $Args{l} || “nobody”;
my $outfile = $Args{o} || “tlrc.out”;
my @shlcmd = $Args{c};
my $conprot = $Args{T} or “ssh”;

if ( $conprot = “rsh|remsh|rlogin” ) {
$conprot = “rsh”;
}
else {
&printUsage &amp;amp;& exit 1;
}

open( RHL, “< $hlist" ) or die "Unable to open input file $hlist: $! \n"; @rhl = ;
close(RHL);
open( WOF, “|tee $outfile” )
or die “Unable to open output file $outfile for writes: $! \n”;
open( WHL, “>> hlist.tlrc” );

if ( $Args{c} ) {
die “Can’t execute $Args{c} with the \”-t\” switch \n”
if ( ( $Args{t} or $Args{d} ) );
runCmd(@shlcmd);
}

if ( $Args{d} ) {
die “Can’t munge dmesg and run login tests at the same time! \n”
if $Args{t};
&dmesgMunger;
}

if ( $Args{t} ) {
&loginTest;
}

sub printUsage {
print
“Usage: $0 [ -l <> ][ -i
][ -c ][ -n ]|[ -a ]|[ -t ]|[ -h ] \n”;
print
“\t -l — pass the login name you want to use for this session \n
\t -i
— pass the input file (colon-delimited) with list of hosts an
d pingability status \n
\t -c — quoted Command you want to run remotely \n
\t -n — comma delimited list of hosts you want to run remote c
ommand specified with \”cmdstring\” on \
\t -a — specifies all hosts in input file to run remote command specified with
\”cmdstring\” on \
\t -T — specifies the Connection type — ssh or rsh \
\t -t — Optional switch to the -c or -h switches, it will only run testing port
ion of the script \
\t -h — print this message \n”;
}

sub runCmd {

my @cmdstring = @_;

if ( $Args{a} ) {
foreach $line (@rhl) {
next if ( $line =~ m/^#/ );
next if ( $line =~ m/^$/ );
my ( $name, $domain, $ip, $pstate, $canlogin, $contype, $serial,
$hid, $usage )
= split( ‘:’, $line );
chomp( $name, $domain, $ip, $pstate, $canlogin, $contype, $serial,
$hid, $usage );
if ( $pstate == 0 ) {
if ( $canlogin == 0 ) {
if ( $contype == 0 ) {
ssh_cmd( $lid, $name, @cmdstring );
}
elsif ( $contype == 1 ) {
rsh_cmd( $lid, $name, @cmdstring );
}
else {
print “Cannot understand connection type! \n”;
}
}
else {
print “Cannot log into the server! \n”;
}
}
else {
print “$name is unpingable — can’t reach! \n”;
}
}
}
elsif ( $Args{n} ) {
$hlist = $Args{n};
@hostlist = split( ‘ ‘, $hlist );
foreach $name (@hostlist) {
if ( $conprot = “ssh” ) {
ssh_cmd( $lid, $name, @cmdstring );
}
elsif ( $conprot = “rsh” ) {
rsh_cmd( $lid, $name, @cmdstring );
}
else {
die “Unknown Option with \”-T\” switch! \n”;
}
}
}
}

sub ssh_cmd {

my ( $id, $host, @cmd ) = @_;
print “$ssh $id\@$host ‘@cmd’ \n”;
@sshout = qx/$ssh $id\@$host ‘@cmd’/;

#or die “Can’t run cmd : $! \n”;
print WOF “$host \n”;
print WOF “@sshout \n”;
}

sub rsh_cmd {

my ( $id, $host, @cmd ) = @_;
print “$rsh -l $id $host ‘@cmd’ \n”;

@rshout = qx/$rsh -l $id $host ‘@cmd’ /;

#or die “can’t run $rsh -l $id $host ‘@cmd’ : $! \n”;

print WOF “$host \n”;
print WOF “@rshout \n”;
}

sub dmesgMunger {

&getToday;
&runCmd(
“cat /var/adm/messages|grep \”$today\”|egrep -v \”vas|auth\|lw8\|mail.info\|Wait
ing\|Networker savegroup\|local1|checked|wrap|Normal\”|egrep -i \”scsi|disk|err|
fatal|pers|mem|link|fcp|AFT|ASFR|PSYND|ESYND|full|vx_nospace|vxfs|vxvm\”"
);
}

sub getToday {
my ( $sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst ) =
localtime(time);
chomp( $sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst );
$year += 1900;
$mon += 1;

my %months = (
1 => ‘Jan’,
2 => ‘Feb’,
3 => ‘Mar’,
4 => ‘Apr’,
5 => ‘May’,
6 => ‘Jun’,
7 => ‘Jul’,
8 => ‘Aug’,
9 => ‘Sep’,
10 => ‘Oct’,
11 => ‘Nov’,
12 => ‘Dec’,
);

if ( $mday < mday = " $mday" today = "$months{$mon} $mday" line ="~" pstate ="=" npstate =" $p-">ping( $name, 1 );
if ( $npstate == 0 ) {

#$p->close();
print “Running \”$ssh $lid\@$name\”…\n”;
my @sshout =
system( “$ssh”, “-l”, “$lid”, “$name”, “\’exit\’” );
$exitval = $? >> 8;
chomp $exitval;
print WOF
“attempt to log into $name ended with status $exitval \n”;
print WHL “$name:$domain:$ip:$pstate:$exitval\n”;
}
else {
print WOF “Unable to ping $name \n”;
}
}
if ( $pstate == 1 ) {
print
“$hlist says $name is inaccessible.\nBut I will try to ping $name again anyway..
.\n”;
my $npstate = $p->ping( $name, 1 );
if ( $npstate == 0 ) {

#$p->close();
print “Running \”$ssh $lid\@$name\”…\n”;
my @sshout =
system( “$ssh”, “-l”, “$lid”, “$name”, “\’exit;\’” );
$exitval = $? >> 8;
chomp $exitval;
print WOF
“attempt to log into $name ended with status $exitval \n”;
print WHL “$name:$domain:$ip:$pstate:$exitval\n”;
}
else {
print WOF “Unable to ping $name \n”;
}
}
}
close(WOF);
}

inventory.txt (the input file passed to tlrc.pl) –

#HOSTNAME:DOMAIN NAME:IP:PINGABLE(1 == no; 0 == yes):Login(1 == no;0 == yes):Connection(0=ssh:1=telnet/rsh):SERIAL:HOSTID:USAGE(P|NP)


This is a colon-delimited file with fields as listed above. Not all of them are required for running the script, but can be useful in certain cases (eg: hostid, serial #).

ndd_get.sh is a Korn-shell based script that returns the NIC link-related statistics in a comma-separated output format.

#!/usr/bin/env ksh

NDD=/usr/sbin/ndd
ID=`/usr/xpg4/bin/id -u`
HOSTNAME=`/usr/bin/hostname`

printUsage() {
echo “Usage: $0 [ -a ]|[-n -i ]|[-h] \n”;
}

splitter() {
interface=$1
INSTANCE=`echo $interface|awk -F\e ‘{print $2}’`
BASEDEV=`echo $interface|awk -F\e ‘{print $1}’`
ADAPTER=”$BASEDEV”e
}

macipget() {
IF=$1
IFCONFIG=/usr/sbin/ifconfig
IP=`$IFCONFIG $IF|grep inet|awk ‘{print $2}’`
MAC=`$IFCONFIG $IF|grep ether|awk ‘{print $2}’`
}

nddget() {
#set -x
AD=$1
INST=$2
$NDD -set /dev/$AD instance $INST
LSTAT=`$NDD -get /dev/$AD link_status`
LSPEED=`$NDD -get /dev/$AD link_speed`
LMODE=`$NDD -get /dev/$AD link_mode`
IS_100FDX=`$NDD -get /dev/$AD adv_100fdx_cap`
IS_100HDX=`$NDD -get /dev/$AD adv_100hdx_cap`
IS_10FDX=`$NDD -get /dev/$AD adv_10fdx_cap`
IS_10HDX=`$NDD -get /dev/$AD adv_10hdx_cap`
AUTONEG=`$NDD -get /dev/$AD adv_autoneg_cap`
LP_100FDX=`$NDD -get /dev/$AD lp_100fdx_cap`
LP_100FDX=`$NDD -get /dev/$AD lp_100hdx_cap`
LP_10FDX=`$NDD -get /dev/$AD lp_10fdx_cap`
LP_10HDX=`$NDD -get /dev/$AD lp_10hdx_cap`
LP_AUTONEG=`$NDD -get /dev/$AD lp_autoneg_cap`
if [ $LSTAT -eq 0 ]; then
linkstat=”down”
else
linkstat=”up”
fi
if [ $LSPEED -eq 0 ]; then
linkspeed=”10″
else
linkspeed=”100″
fi
if [ $LMODE -eq 0 ]; then
linkmode=”Half Duplex”
else
linkmode=”Full Duplex”
fi
if [ $AUTONEG -eq 0 ]; then
autoneg=”Off”
else
autoneg=”on”
fi
if [ $LP_AUTONEG -eq 0 ]; then
lp_autoneg=”Off”
else
lp_autoneg=”On”
fi
IF=$AD$INST
macipget $IF
print “$HOSTNAME,$IF,$IP,$MAC,$linkstat,$linkspeed,$linkmode,$autoneg,$lp_au
toneg”
}

kstatget() {
#set -x
AD=$1
INST=$2

linkspeed=`/usr/bin/kstat -p $AD|grep -i link_|\
grep “$AD:$INST”|grep link_speed|awk ‘{print $2}’`

is_up=`/usr/bin/kstat -p $AD|grep -i link_|\
grep “$AD:$INST”|grep link_up| awk ‘{print $2}’`
if [ $is_up -eq 1 ]; then
linkstat=”UP”
else
linkstat=”DOWN”
fi
LINK_MODE=`/usr/bin/kstat -p $AD|grep -i link_|\
grep $AD:$INST|grep link_duplex|awk ‘{print $2}’`
case $LINK_MODE in
2) linkmode=”Full Duplex”;;
1) linkmode=”Half Duplex”;;
*) linkmode=”Unknown”;;
esac

$NDD -set /dev/$AD instance $INST
AUTONEG=`$NDD -get /dev/$AD adv_autoneg_cap`
LP_AUTONEG=`/usr/bin/kstat -p $AD|\
grep $AD:$INST|grep lp_cap_autoneg|awk ‘{print $2}’`
if [ $AUTONEG -eq 0 ]; then
autoneg=”Off”
else
autoneg=”On”
fi
if [ $LP_AUTONEG -eq 0 ]; then
lp_autoneg=”Off”
else
lp_autoneg=”On”
fi
IF=$AD$INST
macipget $IF
print “$HOSTNAME,$IF,$IP,$MAC,$linkstat,$linkspeed,$linkmode,$autoneg,$lp_au
toneg”

}

bgekstatget() {
#set -x
AD=$1
INST=$2

linkspeed=`/usr/bin/kstat -m $AD -i $INST -n parameters|\
grep -i link_| grep link_speed|awk ‘{print $2}’`

is_up=`/usr/bin/kstat -m $AD -i $INST -n parameters|\
grep -i link_|grep link_status| awk ‘{print $2}’`
if [ $is_up -eq 1 ]; then
linkstat=”UP”
else
linkstat=”DOWN”
fi
LINK_MODE=`/usr/bin/kstat -m $AD -i $INST -n parameters|\
grep -i link_|grep link_duplex|awk ‘{print $2}’`
case $LINK_MODE in
2) linkmode=”Full Duplex”;;
1) linkmode=”Half Duplex”;;
*) linkmode=”Unknown”;;
esac

AUTONEG=`/usr/bin/kstat -m $AD -i $INST -n parameters|\
grep -i link_|grep autoneg|awk ‘{print $2}’`
LP_AUTONEG=`/usr/bin/kstat -m $AD -i $INST -n parameters|\
grep lp_| grep autoneg |awk ‘{print $2}’`
if [ $AUTONEG -eq 0 ]; then
autoneg=”Off”
else
autoneg=”On”
fi
if [ $LP_AUTONEG -eq 0 ]; then
lp_autoneg=”Off”
else
lp_autoneg=”On”
fi

IF=$AD$INST
macipget $IF
print “$HOSTNAME,$IF,$IP,$MAC,$linkstat,$linkspeed,$linkmode,$autoneg,$lp_au
toneg”

}

dmfeget() {

AD=$1
INST=$2
EADAPT=$AD$INST
#$NDD -set /dev/$EADAPT
# NOte the ndd set is not required since dmfe interfaces are directly
# set up as device files (such as /dev/dmfe0, /dev/dmfe1)

LSTAT=`$NDD -get /dev/$EADAPT link_status`
LSPEED=`$NDD -get /dev/$EADAPT link_speed`
LMODE=`$NDD -get /dev/$EADAPT link_mode`
IS_100FDX=`$NDD -get /dev/$EADAPT adv_100fdx_cap`
IS_100HDX=`$NDD -get /dev/$EADAPT adv_100hdx_cap`
IS_10FDX=`$NDD -get /dev/$EADAPT adv_10fdx_cap`
IS_10HDX=`$NDD -get /dev/$EADAPT adv_10hdx_cap`
AUTONEG=`$NDD -get /dev/$EADAPT adv_autoneg_cap`
LP_AUTONEG=`$NDD -get /dev/$ADAPT lp_autoneg_cap`
if [ $LSTAT -eq 0 ]; then
linkstat=”down”
else
linkstat=”up”
fi
if [ $LSPEED -eq 0 ]; then
linkspeed=”10″
else
linkspeed=”100″
fi
if [ $LMODE -eq 0 ]; then
linkmode=”Half Duplex”
else
linkmode=”Full Duplex”
fi
if [ $AUTONEG -eq 0 ]; then
autoneg=”Off”
else
autoneg=”on”
fi
if [ $LP_AUTONEG -eq 0 ]; then
lp_autoneg=”Off”
else
lp_autoneg=”On”
fi
macipget $EADAPT

print “$HOSTNAME,$EADAPT,$IP,$MAC,$linkstat,$linkspeed,$linkmode,$autoneg,$l
p_autoneg”

}

getParms() {
#set -x
case $ADAPTER in
qfe) nddget $ADAPTER $INSTANCE;;
hme) nddget $ADAPTER $INSTANCE;;
eri) nddget $ADAPTER $INSTANCE;;
ce) kstatget $ADAPTER $INSTANCE;;
bge) bgekstatget $ADAPTER $INSTANCE;;
dmfe) dmfeget $ADAPTER $INSTANCE;;
*) echo “Error: Unknown adapter! \n” &&amp;amp; exit 1;;
esac
}

nicStatAll() {
#set -x
/usr/sbin/ifconfig -a|nawk ‘/UP/{print $1}’|egrep -v “lo0|clprivnet”| \
awk -F: ‘{print $1}’ |sort -nr|uniq > /tmp/iflist;
for interface in `cat /tmp/iflist`
do
if [ $interface = ":*" ]; then
next
fi
# Deprecated code — left behind for old time’s sake
#count=`echo $interface|wc -m|sed -e”s!^[ /t]!!g”`
#count1=`expr $count - 2`
#count2=`expr $count - 1`
#int=`echo $interface|cut -c 1-${count1}`
#dev=/dev/${int}
#inst=`echo $interface|cut -c ${count2}`
case $interface in
eri*) INSTANCE=`echo $interface|awk -F\i ‘{print $2}’`
BASEDEV=`echo $interface|awk -F\i ‘{print $1}’`
ADAPTER=”$BASEDEV”i;;
*) splitter $interface;;
esac
getParms
done
}

if [ $ID -ne 0 ]; then
echo “ERROR: You are not root! Only root can run this script!\n”;
exit 1;
fi

while getopts an:i:h arg
do
case $arg in
a) nicStatAll &&amp;amp; exit 0;;
n) ADAPTER=${OPTARG};;
i) INSTANCE=${OPTARG};;
h) printUsage &&amp;amp; exit 0;;
*) printUsage &&amp;amp; exit 1;;
esac
done
shift $(($OPTIND - 1))

if [ ! -z ${ADAPTER} ]; then
if [ ! -z ${INSTANCE} ]; then
getParms
else
printUsage && exit 1
fi
else
printUsage && exit 1
fi

On the centralized management host (whose SSH2-based Key is trusted by the monitored hosts) run the following command to perform the inventory:

admin:(dev) $ sudo ./tlrc.pl -l root -a \-c "/path/to/nddget.sh -a" \-o ~/logs/ndd_get_today.txt/usr/bin/ssh root@host1 '/path/to/ndd_get.sh -a'/usr/bin/ssh root@host2 '/path/to/ndd_get.sh -a'host1host1,bge2,IP,MAC,UP,100,Full Duplex,On,Onhost1,bge1,IP,MAC,UP,100,Full Duplex,On,Onhost1,bge0,IP,MAC,UP,100,Full Duplex,On,Onhost2host2,bge2,10.228.147.62,0:3:ba:49:45:51,UP,100,Full Duplex,On,Onhost2,bge1,10.228.143.62,0:3:ba:49:45:50,UP,100,Full Duplex,On,Onhost2,bge0,10.228.139.62,0:3:ba:49:45:4f,UP,100,Full Duplex,On,On/usr/bin/rsh -l root host3 '/path/to/ndd_get.sh -a'/usr/bin/rsh -l root host4 '/path/to/ndd_get.sh -a'

host3host3,qfe1,IP,MAC,up,100,Full Duplex,Off,Offhost3,qfe0,IP,MAC,up,100,Full Duplex,Off,Offhost3,ce0,IP,MAC,UP,1000,Full Duplex,On,On

Look at the text output created thus:

admin:(logs) $ more ndd_get_today.txt

host1host1,bge2,IP,MAC,UP,100,Full Duplex,On,Onhost1,bge1,IP,MAC,UP,100,Full Duplex,On,Onhost1,bge0,IP,MAC,UP,100,Full Duplex,On,Onhost2host2,bge2,IP,MAC,UP,100,Full Duplex,On,Onhost2,bge1,IP,MAC,UP,100,Full Duplex,On,Onhost2,bge0,IP,MAC,UP,100,Full Duplex,On,On

Now look at the sudo log file to see if there’s associated logging captured.

admin:(log) $ sudo tail sudo.logSep  5 16:32:36 : lahirdx : TTY=pts/27 ; PWD=/export/home/lahirdx/dev ;    USER=root ; COMMAND=/usr/bin/ssh aesdbc1Sep  6 09:49:31 : lahirdx : TTY=pts/30 ; PWD=/export/home/lahirdx/dev ;    USER=root ; COMMAND=./tlrc.pl -a -c /export/patches/Scripts/bin/ndd_get.sh    -a -o /export/home/lahirdx/logs/ndd_get_9606.txtSep  6 09:49:40 : lahirdx : TTY=pts/30 ; PWD=/export/home/lahirdx/dev ;    USER=root ; COMMAND=./tlrc.pl -l root -a -c    /export/patches/Scripts/bin/ndd_get.sh -a -o /export/home/lahirdx/logs/ndd_get_9606.txt

NOTE: Look at the full command line, who executed a particular command, when etc getting captured in the logs. Also, it is imperative to ensure that the “/path/to/ndd_get.sh” is the same on all the monitored hosts. This author recommends creating a system V package to deploy commonly used scripts and tools under /opt/tools (or similar directory structure) to ensure standardization of the environment.