Free Blog Themes and Blog Templates

Sun cluster mediator problem in a two node cluster

Recently we found a "bug" in Sun Cluster 3.2. If having a two node "metropolitan cluster", a site separation end up in a death of the SVM based device group. The situation is the following.

We have to nodes (V1,V2), and we use quorum server (qserver) to provide the necessary votes. We run an SVM device group (test), and a resource group (nfs-rg) having a HAStoragePlus resource (nfs-stor) maintaining a volume from the disk set test

V1:~# clrg status
=== Cluster Resource Groups ===  
Group Name       Node Name       Suspended      Status 
----------       ---------       ---------      ------ 
nfs-rg           V1              No             Online
                 V2              No             Offline
V1:~# clrs status
=== Cluster Resources ===

Resource Name       Node Name      State        Status Message
-------------       ---------      -----        --------------
nfs-server          V1             Online       Online - LogicalHostname online.
                    V2             Offline      Offline

nfs-stor            V1             Online       Online
                    V2             Offline      Offline

nfs-res             V1             Online       Online - Service is online.
                    V2             Offline      Offline
V1:~# cldg status

=== Cluster Device Groups ===

--- Device Group Status ---

Device Group Name     Primary     Secondary     Status
-----------------     -------     ---------     ------
test                  V1          V2            Online
V1:~# df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c1t0d0s0       31G   7.3G    23G    24%    /
...
/dev/did/dsk/d2s3      480M   3.6M   428M     1%    /global/.devices/node@1
/dev/md/test/dsk/d0    188M    23M   146M    14%    /test
/dev/did/dsk/d11s3     480M   3.6M   428M     1%    /global/.devices/node@2
V1:~# metaset -s test 
Set name = test, Set number = 1

Host                Owner
  V1                 Yes
  V2                 

Mediator Host(s)    Aliases
  V1                  
  V2                  

Driv Dbase

d4   Yes  

d7   Yes  
V1:~# clq status
=== Cluster Quorum ===
--- Quorum Votes Summary ---

            Needed   Present   Possible
            ------   -------   --------
            2        3         3

--- Quorum Votes by Node ---

Node Name       Present       Possible       Status
---------       -------       --------       ------
V1              1             1              Online
V2              1             1              Online

--- Quorum Votes by Device ---

Device Name       Present      Possible      Status
-----------       -------      --------      ------
qs                1            1             Online

Now, simulating a "site failure", we shut down V1 and one storage box, so that the half of the metadb replicas, and also the half of the mediators are lost. According to a Sunsolve doc description, the mediators are intended to provide an in memory state database replicas in case of disk failures, but fail is also a mediator vote is lost.

3. If the replica quorum is not met, half of the replicas is accessible, the mediator quorum is not met, half of the mediator hosts is accessible, and the replica and mediator data match, the system prompts you to grant or deny access to the diskset.
  • Replicas (diskset) == half
  • Mediator hosts (diskset) == half
  • Replicas (diskset) ~= Mediator hosts (diskset)

In our case it means the death of our data set

Feb 18 14:56:41 V2 Cluster.Framework: [ID 801593 daemon.error] stderr: metaset: 
V2: test: 50replicas & 50mediator hosts available, user intervention required
Feb 18 14:56:41 V2 Cluster.Framework: [ID 801593 daemon.notice] stdout: Only 50%
 replicas and 50% mediator hosts available for diskset test

Feb 18 14:56:41 V2 cl_runtime: [ID 371250 kern.warning] WARNING: Failed to set t
his node as primary for service 'test'.
Feb 18 14:56:41 V2 SC[,SUNW.HAStoragePlus:6,nfs-rg,nfs-stor,hastorageplus_prenet
_start]: [ID 579208 daemon.error] Global service test associated with path /test
 is found to be in maintenance state.

As mediator votes did not help, what is adding more disk based votes? Let's try ISCSI!

qserver:~# mkdir /iscsi
qserver:~# iscsitadm modify admin -d /iscsi
qserver:~# iscsitadm create target --size 200m qs-disk
V1:~# iscsiadm add discovery-address qserver
V1:~# iscsiadm modify discovery --sendtargets enable
V1:~# iscsiadm list target
Target: iqn.1986-03.com.sun:02:06552d12-12f1-482b-d7ba-951dc13e0187.qs-disk
        Alias: qs-disk
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1

V1:~# devfsadm -i iscsi
V1:~# cldev refresh -v
Successfully refreshed DID devices on node V1

V1:~# cldev list -v
DID Device          Full Device Path
----------          ----------------
...
d13                 V2:/dev/rdsk/c5t010000E081587B3900002A00499C08A1d0
d13                 V1:/dev/rdsk/c5t010000E081587B3900002A00499C08A1d0

This procedure must be made on both nodes. Unfortunatelly, ISCSI devices cannot be used as shared ones (so cannot be used as quorum disk), but for holding and additional replica, it is prefect. So let's include that in the diskset.

V1:~# metaset -s test -a /dev/did/rdsk/d13
V1:~#  metaset -s test
Set name = test, Set number = 1

Host                Owner
  V1                 Yes
  V2                 

Mediator Host(s)    Aliases
  V1                  
  V2                  

Drive Dbase

d4    Yes  

d7    Yes  

d13   Yes  

Let's try a site failure again! A seen below, the disk set, and the appropriate volume needs maintenance (as of one of the mirror legs became unavailable), but the volume is still available.

V2:~# clrg status

=== Cluster Resource Groups ===

Group Name       Node Name       Suspended      Status
----------       ---------       ---------      ------
nfs-rg           V1              No             Offline
                 V2              No             Online

V2:~# clq status

=== Cluster Quorum ===

--- Quorum Votes Summary ---

            Needed   Present   Possible
            ------   -------   --------
            2        2         3


--- Quorum Votes by Node ---

Node Name       Present       Possible       Status
---------       -------       --------       ------
V1              0             1              Offline
V2              1             1              Online


--- Quorum Votes by Device ---

Device Name       Present      Possible      Status
-----------       -------      --------      ------
qs                1            1             Online


V2:~# metastat -s test -c
test/d0          p  200MB test/d30
    test/d30     m   68GB test/d10 (maint) test/d20
        test/d10 s   68GB d4s0 (maint)
        test/d20 s   68GB d7s0

Well, that smells like hack, but it is a solution, if you want to avoid buying a VxVM licence, or avoid upgrading to Sun Cluster 3.2u3. (In the latest release of Sun Cluster, it is allowed to add an additional server (such as our quorum server) as mediator to any disksets).

Feedback awaiting moderation

This post has 301 feedbacks awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
Miami Real Estate Blog Theme
Free Blog Themes and Templates