Jul 182007
 

Data Services in the Cluster

HAStoragePlus helps configure a local filesystem into a highly available one. It provides following capabilities:

  • additional filesystem checks
  • mounts and unmounts
  • enables Sun cluster to failover local file systems (to failover, local file system must reside on global dgs with affinity switchovers enabled)

Data Service Agent — is a specially written software that allows a data service in a cluster to operate properly.

Data Service Agent (or Agent) does the following to a standard application:

  • stop/start application
  • monitor faults
  • validate configuration
  • provides a registration information file that allows Sun Cluster to store all the info about the methods.

Sun Cluster 2.x runs Fault Monitoring components on failover node, and can initiate a takeover. On Cluster 3.x software, it is not allowed. Monitor can either monitor to restart or failover on primary (active host) node.

Failover resource groups:

Logical host resource — SUNW.Logicalhostname Data Storage Resource — SUNW.HAStoragePlus NFS resource — SUNW.nfs

Shutdown a resource group:

<span style="font-size:85%;">scswitch -F -g <rgname><br /><span class="anchor" id="line-534"></span></span>

Turn on a resourec group:

<span style="font-size:85%;">scswitch -Z -g <rgname><br /><span class="anchor" id="line-540"></span></span>

Switch a failover group over to another node:

<span style="font-size:85%;">scswitch -z -g <rgname> -h <node><br /><span class="anchor" id="line-546"></span></span>

Restart a resource group:

<span style="font-size:85%;">scswitch -R -h <node> -g <rgname><br /><span class="anchor" id="line-552"></span></span>

Evacuate all resources and rgs from a node:

<span style="font-size:85%;">scswitch -S -h node<br /><span class="anchor" id="line-558"></span></span>

Disable a res and it’s fault monitor:

<span style="font-size:85%;">scswitch -n -j <resource><br /><span class="anchor" id="line-564"></span></span>

Enable a resource and it’s fault monitor:

<span style="font-size:85%;">scswitch -e -j <res><br /><span class="anchor" id="line-570"></span></span>

Clear the STOP_FAILED flag:

<span style="font-size:85%;">scswitch -c -j <resname> -h <nodename> -f STOP_FAILED<br /><span class="anchor" id="line-576"></span></span>

How to add a diskgroup and voluem to Cluster configuration

1. Create the disk group and volume.

2. Register the local disk group with the cluster.

<span style="font-size:85%;">        root@aesnsra1:../ # scconf -a -D type=vxvm,name=patroldg2,nodelist=aesnsra2<br /><span class="anchor" id="line-585"></span>        root@aesnsra2:../ # scswitch -z -h aesnsra2 -D patroldg2<br /><span class="anchor" id="line-586"></span></span>

3. Create your file system.

4. Update /etc/vfstab to change ‘-‘ boot options

  • example:

<span style="font-size:85%;">        /dev/vx/dsk/patroldg2/patroldg02 /dev/vx/rdsk/patroldg2/patroldg02 /patrol02 vxfs 3 no suid<br /><span class="anchor" id="line-594"></span></span>

5. Set up a resource group with a HAStoragePlus resource for local filesystem:

<span style="font-size:85%;">        root@aesnsra2:../ # scrgadm -a -g aescib1-hastp-rg -h aescib1<br /><span class="anchor" id="line-599"></span>        root@aesnsra2:../ # scrgadm -a -g aescib1-hastp-rg -j sapmntdg01-rs -t SUNW.HAStoragePlus -x FilesystemMountPoints=/sapmnt<br /><span class="anchor" id="line-600"></span></span>

6. Bring the resource group online which will mount the specified filesystem:

<span style="font-size:85%;">        root@aesnsra2:../ # scswitch -Z -g hastp-aesnsra2-rg<br /><span class="anchor" id="line-605"></span></span>

7. Enable resource

<span style="font-size:85%;">        root@aesnsra2:../# scswitch -e -j osdumps-dev-rs<br /><span class="anchor" id="line-610"></span></span>

Optional step:

8. reboot and test.

Fault monitor operations

Disable the fault monitor for a resource:

<span style="font-size:85%;">scswitch -n -M -j <resname><br /><span class="anchor" id="line-621"></span></span>

Enable the Fault monitor for a resource:

<span style="font-size:85%;">scswitch -e  -M -j <resname><br /><span class="anchor" id="line-627"></span></span>

<span style="font-size:85%;">scstat -g       #shows status of all resource groups<br /><span class="anchor" id="line-631"></span></span>

Using scrgadm to register and configure Data service software

eg:

<span style="font-size:85%;">scrgadm -a -t SUNW.nfs<br /><span class="anchor" id="line-639"></span><br /><span class="anchor" id="line-640"></span>scrgadm -a -t SUNW.HAStoragePlus<br /><span class="anchor" id="line-641"></span>scrgadm -p<br /><span class="anchor" id="line-642"></span></span>

Create a fail over res:

<span style="font-size:85%;">scrgadm -a -f nfs-rg -h node1,node2 \<br /><span class="anchor" id="line-648"></span>-y Pathprefix=/global/nfs/admin<br /><span class="anchor" id="line-649"></span></span>

Add logical host name res to rg:

<span style="font-size:85%;">scrgadm -a -L -g nfs-rg -l clustername-nfs<br /><span class="anchor" id="line-655"></span></span>

Create a HAStoragePlus res:

<span style="font-size:85%;">scrgadm -a -j nfs-stor -g nfs-rg \<br /><span class="anchor" id="line-661"></span>-t SUNW.HAStoragePlus \<br /><span class="anchor" id="line-662"></span>-x FilesystemMountpoints=/global/nfs -x AffinityOn=True<br /><span class="anchor" id="line-663"></span></span>

Create SUNW.nfs resource:

<span style="font-size:85%;">scrgadm -a -j nfs-res -g nfs-rg \<br /><span class="anchor" id="line-669"></span>-t SUNW.nfs -y Resource_dependencies=nfs-stor<br /><span class="anchor" id="line-670"></span></span>

Print the various resource/resource group dependencies via scrgadm:

<span style="font-size:85%;">scrgadm -pvv|grep -i depend     #And then parse this output<br /><span class="anchor" id="line-676"></span></span>

Enable res and res monitors, manage rg and switch rg to online state:

<span style="font-size:85%;">scswitch -Z -f nfs-rg<br /><span class="anchor" id="line-682"></span><br /><span class="anchor" id="line-683"></span>scstat -g<br /><span class="anchor" id="line-684"></span></span>

Show current RG configuration:

<span style="font-size:85%;">scrgadm -p[v[v]] [ -t resource_type_name ] [ -g resgrpname ] \<br /><span class="anchor" id="line-690"></span>[ -j resname ]<br /><span class="anchor" id="line-691"></span></span>

Resizing a VxVM/VxfS vol/fs under sun cluster

<span style="font-size:85%;"># vxassist -g aesnfsp growby saptrans 5g<br /><span class="anchor" id="line-698"></span><br /><span class="anchor" id="line-699"></span># scconf -c -D name=aesnfsp,sync<br /><span class="anchor" id="line-700"></span><br /><span class="anchor" id="line-701"></span>root@aesrva1:../ # vxprint -g aesnfsp -v saptrans<br /><span class="anchor" id="line-702"></span>TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0<br /><span class="anchor" id="line-703"></span>v  saptrans     fsgen        ENABLED  188743680 -       ACTIVE   -       -<br /><span class="anchor" id="line-704"></span><br /><span class="anchor" id="line-705"></span>root@aesrva1:../ # fsadm -F vxfs -b 188743680 /saptrans<br /><span class="anchor" id="line-706"></span>UX:vxfs fsadm: INFO: /dev/vx/rdsk/aesnfsp/saptrans is currently 178257920 sector<br /><span class="anchor" id="line-707"></span>s - size will be increased<br /><span class="anchor" id="line-708"></span><br /><span class="anchor" id="line-709"></span># root@aesrva1:../ # scconf -c -D name=aesnfsp,sync<br /><span class="anchor" id="line-710"></span></span>

Command Quick Reference

<span style="font-size:85%;">scstat<br /><span class="anchor" id="line-717"></span><br /><span class="anchor" id="line-718"></span>scconf<br /><span class="anchor" id="line-719"></span><br /><span class="anchor" id="line-720"></span>scrgadm<br /><span class="anchor" id="line-721"></span><br /><span class="anchor" id="line-722"></span>scha_<resource><br /><span class="anchor" id="line-723"></span><br /><span class="anchor" id="line-724"></span>scdidadm<br /><span class="anchor" id="line-725"></span></span>

Sun Terminal Concentrator (Annex NTS)

Enable setup mode by pressing TC test button until TC power indicator starts to blink rapidly, then release the button and press it briefly.

On entering the Setup mode, a “monitor:” prompt is displayed.

Set up IP address using:

<span style="font-size:85%;">monitor::addr<br /><span class="anchor" id="line-737"></span></span>

Setting up Load source:

<span style="font-size:85%;">monitor::seq<br /><span class="anchor" id="line-743"></span></span>

Specifying image:

<span style="font-size:85%;">monitor::image<br /><span class="anchor" id="line-749"></span></span>

<span style="font-size:85%;">Telnet into the TC IP address:<br /><span class="anchor" id="line-753"></span><br /><span class="anchor" id="line-754"></span>enter "cli"<br /><span class="anchor" id="line-755"></span><br /><span class="anchor" id="line-756"></span>Elevate to privileged acct using "su"<br /><span class="anchor" id="line-757"></span><br /><span class="anchor" id="line-758"></span>Run "admin" at the TC OS prompt:<br /><span class="anchor" id="line-759"></span><br /><span class="anchor" id="line-760"></span>get admin: subprompt:<br /><span class="anchor" id="line-761"></span><br /><span class="anchor" id="line-762"></span>show port=1 type mode<br /><span class="anchor" id="line-763"></span>set port=<num> type <hardwired> mode <cli> #Choose various options<br /><span class="anchor" id="line-764"></span>quit (to exit the boot prompt)<br /><span class="anchor" id="line-765"></span>boot</span><br />

 Posted by at 8:53 pm
Jul 182007
 

Displays existing DG resources in the Cluster

<span style="font-size:85%;">scstat -D<br /><span class="anchor" id="line-327"></span></span>

Registering VxVM DGs

<span style="font-size:85%;">scconf -a -D type=vxvm,name=<dgname>. \<br /><span class="anchor" id="line-333"></span>nodelist=<node1>:<node2>, \<br /><span class="anchor" id="line-334"></span>preferenced=true,failback=enabled<br /><span class="anchor" id="line-335"></span></node2></node1></dgname></span>

  • nodelist should contain only nodes that are physically connected to the disks of that dg.
  • preferenced=true/false affects whether nodelist indiciates an order of failover preference. On a two-node cluster, this options is only meaningful if failback is enabled.
  • failback=disabled/enabled affects whether a preferred node “takes back” it’s device group when it joins the cluster. The default value is disabled. When faileback is disabled, preferenced is set to false. If it is enabled, preferenced also must be set to true.

Moving DGs across nodes of a cluster

When VxVM dgs are registered as Sun Cluster resources, NEVER USE vxdg import/deport commands to change ownership (node-wise) of the dgs. This will cause SC to treat dg as failed resource.

Use the following command instead:

<span style="font-size:85%;"># scswitch -z -D <dgname> -h <node_to_switch_to><br /><span class="anchor" id="line-351"></span></node_to_switch_to></dgname></span>

Resyncing Device Groups

<span style="font-size:85%;">scconf -c -D name=<dgname>,sync<br /><span class="anchor" id="line-357"></span></dgname></span>

Changing DG configuration

<span style="font-size:85%;">scconf -c -D name=<dgname>,preferenced=<true|false>,failback=<enabled|disabled><br /><span class="anchor" id="line-363"></span></enabled|disabled></true|false></dgname></span>

Maintenance mode

<span style="font-size:85%;">scswitch -m -D <dgname><br /><span class="anchor" id="line-369"></span></dgname></span>

NOTE: all volumes in the dg must be unopened or unmounted (not being used) in order to do that.

To come back out of maintenance mode

<span style="font-size:85%;">scswitch -z -D <dgname> -h <new_primary_node><br /><span class="anchor" id="line-377"></span></new_primary_node></dgname></span>

Repairing DID device database after replacing JBOD disks

  • ‘Make sure you know which disk to update …’

<span style="font-size:85%;">scdidadm -l c1t1d0<br /><span class="anchor" id="line-384"></span></span>

returns node1:/dev/rdsk/c1t1d0 /dev/did/rdsk/d7

<span style="font-size:85%;">scdidadm -l d7<br /><span class="anchor" id="line-390"></span></span>

returns node1:/dev/rdsk/c1t1d0 /dev/did/rdsk/d7

Then use following cmds to update and verify the DID info:

<span style="font-size:85%;">scdidadm -R d7<br /><span class="anchor" id="line-398"></span>scdidadm -l -o diskid d7<br /><span class="anchor" id="line-399"></span></span>

returns a large string with disk id.

Replacing a failed disk in a A5200 Array (similar concept with other FC disk arrays)

<span style="font-size:85%;">vxdisk list - get the failed disk name<br /><span class="anchor" id="line-407"></span><br /><span class="anchor" id="line-408"></span>vxprint -g dgname -- determine state of the volume(s) that might be affected<br /><span class="anchor" id="line-409"></span></span>

On the hosting node, replace the failed disk:

<span style="font-size:85%;">luxadm remove enclosure,position<br /><span class="anchor" id="line-415"></span>luxadm insert enclosure,position<br /><span class="anchor" id="line-416"></span></span>

On either node of the cluster (that hosts the dg):

<span style="font-size:85%;">scdidadm -l c#t#d#<br /><span class="anchor" id="line-422"></span>scdidadm -R d#<br /><span class="anchor" id="line-423"></span></span>

On the hosting node:

<span style="font-size:85%;">vxdctl enable<br /><span class="anchor" id="line-429"></span><br /><span class="anchor" id="line-430"></span>vxdiskadm (replace failed disk in vxvm)<br /><span class="anchor" id="line-431"></span><br /><span class="anchor" id="line-432"></span>vxprint -g <dgname><br /><span class="anchor" id="line-433"></span>vxtask list     #ensure that resyncing is completed<br /><span class="anchor" id="line-434"></span></dgname></span>

Remove any relocated submirrors/plexes (if hot-relocation had to move something out of the way):

<span style="font-size:85%;">vxunreloc repaired-diskname<br /><span class="anchor" id="line-441"></span></span>

Solaris Vol Mgr (SDS) in Sun Clustered Env

Preferred method of using Soft partitions is to use single slices to create mirrors and then create volumes (soft partitions) from that (kind of similar to VxVM public region in an initialized disk).

Shared Disksets and Local Disksets

Only disks that are physically located in the multi-ported storage will be members of shared disksets. Only disks that are in the same diskset operate as a unit; they can be used together to build mirrored volumes, and primary ownership of the diskset transfers as a while from node to node.

Boot disks are the local disksets. This is a pre-requisite in order to have shared disksets.

Replica management

  • Add local replicas manually.
  • Put local state db replicas on slice 7 of disks (as a convention) in order to maintain uniformity. Shared disksets have to have replicas on slice 7.
  • Spread local replicas evenly across disks and controllers.
  • Support for Shared disksets is provided by Pkg SUNWmdm

Modifying /kernel/drv/md.conf

<span style="font-size:85%;">nmd == max num of volumes (default 128)<br /><span class="anchor" id="line-464"></span>md_nsets == max is 32, default 4.<br /><span class="anchor" id="line-465"></span></span>

Creating shared disksets and mediators

<span style="font-size:85%;">scdidadm -l c1t3d0<br /><span class="anchor" id="line-471"></span></span>

  • — returns d17 as DID device

<span style="font-size:85%;">scdidadm -l d17<br /><span class="anchor" id="line-475"></span>metaset -s <disksetname> -a -h <node1> <node2>  # creates metaset<br /><span class="anchor" id="line-476"></span>metaset -s <disksetname> -a -m <node1> <node2>  # creates mediator<br /><span class="anchor" id="line-477"></span>metaset -s <disksetname> -s /dev/did/rdsk/d9 /dev/did/rdsk/d17<br /><span class="anchor" id="line-478"></span>metaset # returns values<br /><span class="anchor" id="line-479"></span>metadb -s <disksetname><br /><span class="anchor" id="line-480"></span>medstat -s <disksetname> (reports mediator status)<br /><span class="anchor" id="line-481"></span></disksetname></disksetname></disksetname></node2></node1></disksetname></node2></node1></disksetname></span>

Remaining syntax vis-a-vis Sun Cluster is identical to that for VxVM.

IPMP and sun cluster

IPMP is cluster un-aware. To work around that, Sun Cluster uses Cluster-specific public network mgr daemon (pnmd) to integrate IPMP into the cluster.

pmnd daemon has two capabilities:

  • populate CCR with public network adapter status
  • facilitate application failover

When pnmd detects all members of a local IPMP group have failed, it consults a file called /var/cluster/run/pnm_callbacks. This file contains entries that would have been created by the activation of Log icalHostname and SharedAddress resources. It is the job of hafoip_ipmp_callback to device whether to migrate resources to another node.

<span style="font-size:85%;">scstat -i       #view IPMP configuration</span><br />

 Posted by at 8:50 pm
Jul 182007
 

Sun Cluster Set up

  • don’t mix PCI and SBus SCSI devices

Quorum Device Rules

  • A quorum device must be available to both nodes in a 2-node cluster
  • QD info is maintained globally in the CCR db
  • QD should contain user data
  • Max and optimal number of votes contributed by QDs must be N -1
    • (where N == number of nodes in the cluster)
  • If # of QDs >= # of nodes, Cluster cannot come up easily if there are too

    • many failed/errored QDs.
  • QDs are not required in clusters with more than 2 nodes, but recommended
    • for higher cluster availability.
  • QDs are manually configured after Sun Cluster s/w installation is done.
  • QDs are configured using DID devices

Quorum Math and Consequences

  • A running cluster is always aware of (Math):
    • –> Total possible Q votes (number of nodes + disk quorum votes) –> Total present Q votes (number of booted nodes + available QD votes) –> Total needed Q votes ( >= 50% of possible votes)

    Consequences:

    • –> Node that cannot find adequate Q votes will freeze, waiting for

      • other nodes to join the cluster

      –> Node that is booted in the cluster but can no longer find the

      • needed number of votes kernel panics

installmode Flag — allows for cluster nodes to be rebooted after/during initial

  • installation without causing the other (active) node(s) to panic.

Cluster status

# Reporting the clsuter membership and quorum vote information

# /usr/cluster/bin/scstat -q

Verifying cluster configuration info

# scconf -p

Run scsetup to correct any configuration mistakes and/or to:

* add or remove quorum disks * add, remove, enable, disable cluster transport components * register/unregister vxVM dgs * add/remove node access from a VxVM dg * change clsuter private host names * change cluster name

Shutting down cluster on all nodes:

# scshutdown -y g 15

# scstat (verifies cluster status)

Cluster Daemons

<span style="font-size:85%;">lahirdx@aescib1:/home/../lahirdx > ps -ef|grep cluster|grep -v grep<br /></span><span style="font-size:85%;">    root     4     0  0   May 07 ?       352:39 cluster<br /></span><span style="font-size:85%;">    root   111     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/qd_userd<br /></span><span style="font-size:85%;">    root   120     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/failfastd<br /></span><span style="font-size:85%;">    root   123     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/clexecd<br /></span><span style="font-size:85%;">    root   124   123  0   May 07 ?        0:00 /usr/cluster/lib/sc/clexecd<br /></span><span style="font-size:85%;">    root  1183     1  0   May 07 ?       46:45 /usr/cluster/lib/sc/rgmd<br /></span><span style="font-size:85%;">    root  1154     1  0   May 07 ?        0:07 /usr/cluster/lib/sc/rpc.fed<br /></span><span style="font-size:85%;">    root  1125     1  0   May 07 ?       23:49 /usr/cluster/lib/sc/sparcv9/rpc.pmfd<br /></span><span style="font-size:85%;">    root  1153     1  0   May 07 ?        0:03 /usr/cluster/lib/sc/cl_eventd<br /></span><span style="font-size:85%;">    root  1152     1  0   May 07 ?        0:04 /usr/cluster/lib/sc/cl_eventlogd<br /></span><span style="font-size:85%;">    root  1336     1  0   May 07 ?        2:17 /var/cluster/spm/bin/scguieventd -d<br /></span><span style="font-size:85%;"><br /></span><span style="font-size:85%;">    root  1174     1  0   May 07 ?        0:03 /usr/cluster/bin/pnmd<br /></span><span style="font-size:85%;">    root  1330     1  0   May 07 ?        0:01 /usr/cluster/lib/sc/scdpmd<br /></span><span style="font-size:85%;">    root  1339     1  0   May 07 ?        0:00 /usr/cluster/lib/sc/cl_ccrad<br /><br /></span>
  • FF Panic rule — failfast will shutdown the node (panic the kernel) if specified daemon is not restarted within 30s.
  • cluster — system proc created by the kernel to encap kernel threads that make up the core kernel range of operatiosn. It directly panics the kernel if it’s sent a KILL signal (SIGKILL). Other signals have no effect.
  • clexecd — this is used by cluster kernel threads to execure userland cmds (such as run_reserve and dofsck cmds). It is also used to run cluster cmds remotely (eg: scshutdown).A failfast driver panics the kernel if this daemon is killed and not restarted in 30s.
  • cl_eventd — This daemon registers and forwards cluster events s(eg: nodes entering and leaving the cluster). With a min of SC 3.1 10/03, user apps can register themselves to receive cluster events. The daemon automatically get’s respawned by rpc.pmfd if it is killed.
  • rgmd — This is the resource group mgr, which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed by not started in 30s.
  • rpc.fed — this is the “fork-and-exec” daemon – -which handles reqs from rgmd to spawn methods for specific data services. failfast will hose the box if this is killed and not restarted in 30s.
  • scguieventd — this daemon processes cluster events for the SunPlex Mgr GUI, so that the display can be updated in real time. It’s not automatically started if it stops. If you are having trouble with SunPlex Mgr, might have to restart the daemon or reboot the specific node.

  • rpc.pmfd — This is the process monitoring facility. It is i used as a general mech to initiate restarts and failure action scripts for some cluster f/w daemons, and for most app daemons and app fault monitors. FF panic rule holds good.
  • pnmd — This is the public Network mgt daemon, and manages n/w status info received from the local IPMP (in.mpathd) running on each node in the cluster. It is automatically restarted by rpc.pmfd if it dies.
  • scdpmd — multi-threaded DPM daemon runs on each node. DPM daemon is started by an rc script when a node boots. It montiors the availability of logical path that is visible thru various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.

Validating basic cluster config

  • The sccheck (/usr/cluster/bin/sccheck) cmd validates the cluster configuration:
  • /var/cluster/sccheck is the repository where it stores the reports generated.

Disk Path Monitoring

  • scdpm -p all:all #prints all disk paths in the clsuter and their status

    scinstall -pv #check the clsuter installation status -- package revisions, patches applied, etc

  • Cluster release file: /etc/cluster/release

Shutting down cluster

  • scshutdown -y -g 30

Booting nodes in non-cluster mode
  • <span style="font-size:85%;">        boot -x<br /></span><span style="font-size:85%;">    </span>

Placing node in maintenance mode
  • scconf -c -q node=,maintstate

Reset the maintenance mode by rebooting the node or running
  • scconf -c -q reset By placing a node in a cluster in maintenance mode, we reduce the number of reqd. quorum votes and ensuring that cluster operation is not disrupted as a result thereof).

    Sunplex manager is available on https::3000

VxVM Rootdg requirements for Sun Cluster

* vxio major number has to be identical on all nodes of the cluster (check for vxio entry in /etc/name_to_major)

* vxvm installed on all nodes physically connected to shared storage — on non-storage nodes, yvxvm can be used to encapsulate and mirror the boot disk. If not using VxVM on a non-storage node, use SVM. All is required in such a case is the vxio major number be identical to all other nodes of the cluster (add an entry in /etc/name_to_major file).

* VxVM license is reqd. on all nodes not connected to a A5x00 storedge array.

* Std rootdg created on all nodes where vxVM is installed. Options to initialize rootdg on each node are:

  • –> Encap boot disk so it can be mirroered. Preserve all data and creating volumes inside rootdg to encap /global/.devices/node@# –> If disk has more than 5 slices on it, it cannot be encap’ed. –> Initialize other local disks into rootdg.

* Unique volume name and minor number across the nodes for the /global/.devices/node@# file system if the boot disk is encap’ed — the /global/.devices/node@# fs must be on devices with a unique name on each node, because it’s mounted on each node for the same reason. The normal Solaris OS /etc/mnttab logic redates global fs and still demands that each device have a unique major/minor number. VxVM doesn’t support changing minor numbers of individual volumes. The entire disk group has to be re-minored.

Use the following command:

<span style="font-size:85%;">#vxdg  [ -g diskgroup ] [ -f ]  reminor<br /></span><span style="font-size:85%;">           [diskgroup ]  new-base-minor<br /></span>

From the vxdg man pages:

<span style="font-size:85%;">     reminor   Changes the base minor number for  a  disk  group,<br /></span><span style="font-size:85%;">               and  renumbers  all devices in the disk group to a<br /></span><span style="font-size:85%;">               range starting at that number.  If the device  for<br /></span><span style="font-size:85%;">               a  volume  is  open,  then  the  old device number<br /></span><span style="font-size:85%;">               remains in effect until the system is rebooted  or<br /></span><span style="font-size:85%;">               until  the disk group is deported and re-imported.<br /></span><span style="font-size:85%;">               Also, if you close an open volume, then  the  user<br /></span><span style="font-size:85%;">               can   execute  vxdg reminor  again  to  cause  the<br /></span><span style="font-size:85%;">               renumbering to take effect  without  rebooting  or<br /></span><span style="font-size:85%;">               reimporting.<br /></span><span style="font-size:85%;"><br /></span><span style="font-size:85%;">               A new device number may also overlap with  a  tem-<br /></span><span style="font-size:85%;">               porary  renumbering for a volume device. This also<br /></span><span style="font-size:85%;">               requires a reboot or reimport for the  new  device<br /></span><span style="font-size:85%;">               numbering to take effect.  A temporary renumbering<br /></span><span style="font-size:85%;">               can happen in the following situations:  when  two<br /></span><span style="font-size:85%;">               volumes  (for  example,  volumes  in two different<br /></span><span style="font-size:85%;">               disk groups) share the same  permanently  assigned<br /></span><span style="font-size:85%;">               device number, in which case one of the volumes is<br /></span><span style="font-size:85%;">               renumbered temporarily to use an alternate  device<br /></span><span style="font-size:85%;">               number; or when the persistent device number for a<br /></span><span style="font-size:85%;">               volume was changed, but the active  device  number<br /></span><span style="font-size:85%;">               could  not be changed to match.  The active number<br /></span><span style="font-size:85%;">               may be left unchanged after  a  persistent  device<br /></span><span style="font-size:85%;">               number change either because the volume device was<br /></span><span style="font-size:85%;">               open, or because the new number was in use as  the<br /></span><span style="font-size:85%;">               active device number for another volume.<br /></span><span style="font-size:85%;"><br /></span><span style="font-size:85%;">               vxdg fails if you try to use a  range  of  numbers<br /></span><span style="font-size:85%;">               that  is  currently  in use as a persistent (not a<br /></span><span style="font-size:85%;">               temporary) device number.  You can  force  use  of<br /></span><span style="font-size:85%;">               the  number range with use of the -f option.  With<br /></span><span style="font-size:85%;">               -f, some device renumberings may not  take  effect<br /></span><span style="font-size:85%;">               until  a  reboot or a re-import (just as with open<br /></span><span style="font-size:85%;">               volumes).  Also, if you force volumes in two  disk<br /></span><span style="font-size:85%;">               groups  to use the same device number, then one of<br /></span><span style="font-size:85%;">               the volumes is temporarily renumbered on the  next<br /></span><span style="font-size:85%;">               reboot.   Which volume device is renumbered should<br /></span><span style="font-size:85%;">               be considered random, except that  device  number-<br /></span><span style="font-size:85%;">               ings in the rootdg disk group take precedence over<br /></span><span style="font-size:85%;">               all others.<br /></span><span style="font-size:85%;">               The -f option should be used  only  when  swapping<br /></span><span style="font-size:85%;">               the  device number ranges used by two or more disk<br /></span><span style="font-size:85%;">               groups.  To swap the number ranges  for  two  disk<br /></span><span style="font-size:85%;">               groups,  you  would  use  -f  when renumbering the<br /></span><span style="font-size:85%;">               first disk group to use the range  of  the  second<br /></span><span style="font-size:85%;">               disk  group.  Renumbering the second disk group to<br /></span><span style="font-size:85%;">               the first range does not require the use of -f.<br /></span><span style="font-size:85%;"><br /></span>
  • Sun Cluster does not work with Veritas DMP. DMP can be disabled before installing the software by putting in dummy symlinks, etc.
  • scvxinstall — is a shell script that automates VxVM installation in a Sun Clustered env.
  • scvxinstall automates the following things:
    • tries to disable DMP (vxdmp)
    • installs correct cluster package
    • automatically negotiates a vxio major number and properly edits /etc/name_to_major
    • automates rootdg initialization process and encapsulates boot disk
      • –> gives different device names for the /global/.devices/node@# volumes on each side –> edites teh vfstab properly for this same volume. The problem is this particular line as DID device on it, and VxVM doesn’t understand DID devices. –> installs a script to “reminor” the rootdg on the reboot –> reboots the node so that VxVM operates properly.

 Posted by at 8:46 pm
Jul 182007
 

Cluster Configuration Repository (CCR)

  • /etc/cluster/ccr (directory)

Important Files

  • /etc/cluster/ccr/infrastructure

Global Services

  • One node is to specific global services. All other nodes communicate with the global services (devices, filesystems) via the Cluster interconnect.

Global Naming (DID Devices)

  • /dev/did/dsk and /dev/did/rdsk

  • DID used only for naming globally — not access
  • DID device names cannot/are not used in VxVM
  • DID device names are used in Sun/Solaris Volume Manager

Global Devices

  • provide global access to devices irrespective of there physical location.
  • Most commonly SDS/SVM/VxVM devices are used as global devices. LVM software is unaware of the implementation of global nature on these devices.

/global/.devices/node@nodeID

  • nodeID is an integer representing the node in the cluster

Global Filesystems

  • # mount -o global, logging /dev/vx/dsk/nfsdg/vol01 /global/nfs

    or edit the /etc/vfstab file to contain the following:

            /dev/vx/dsk/nfsdg/vol01    /dev/vx/rdsk/nfsdg/vol01    /global/nfs    ufs    2    yes    global,logging<br /><span class="anchor" id="line-43"></span>    

Global Filesystem is also known as (aka) Cluster Filesystem (CFS) or PxFS (Proxy File system)

NOTE: Local failover filesystems (ie. directly attached to a storage device) cannot be used for scalable services — one would have to use global filesystems for it.

Console Software

  • SUNWccon There are three wariants of the cluster console software:
    • cconsole ( access the node consoles through the TC or other remote console access method )
    • crlogin (uses rlogin as underlying transport)
    • ctelnet (uses telnet as underlying transport)

      /opt/SUNWcluster/bin/ &

Cluster Control Panel

/opt/SUNWcluster/bin/ccp [ clustername ] &

All necessary info for cluster admin is stored in the following two files:

  • –> /etc/clusters Eg: sc-cluster sc-node1 sc-node2

    –> /etc/serialports

<span class="anchor" id="line-75"></span>        sc-node1 sc-tc 5002             # Connect via TCP port on TC<br /><span class="anchor" id="line-76"></span>        sc-node2 sc-tc 5003<br /><span class="anchor" id="line-77"></span>        sc-10knode1 sc10k-ssp 23        # connect via E10K SSP<br /><span class="anchor" id="line-78"></span>        sc-10knode2 sc10k-ssp 23<br /><span class="anchor" id="line-79"></span>        sc-15knode1 sf15k-mainsc 23     # Connect via 15K Main SC<br /><span class="anchor" id="line-80"></span>        e250node1 RSCIPnode1 23         # Connect via LAN RSC on a E250<br /><span class="anchor" id="line-81"></span>        node1 sc-tp-ws 23               # Connect via a tip launchpad<br /><span class="anchor" id="line-82"></span>        sf1_node1 sf1_mainsc 5001       # Connect via passthru on midframe<br />

 Posted by at 8:13 pm