Free Blog Themes and Templates

Component Soft Ltd - Blogs

Founded in 1995, the Component Soft Ltd has 15 years of experience on Solaris education and support. Our instructors use this site to publicate some nice or nasty features discovered in Solaris, Linux or Mysql.

Free Blog Themes and Templates

Sun Cluster 3.3 - Bypassing amnesia prevention

by Attila Szlovencsák on 06.16.2011

The problem might be known to all of you. If shut down my nodes one after the other, the last node leaving the cluster (host v1 in our case) should be the first booting up in order the provide CCR consistency.
However, it that node refuses to start up (because of some HW error), I am in trouble: all the other nodes stuck in a "attempting to join cluster", since the quorum key on the quorum device belongs to v1 (the last node).
So did we have to wait since the first one repaired? Not really, here is a solution.


So our starting point is the following. An operational cluster, 3 votes, everything is fine.

bash-3.00# clq status

=== Cluster Quorum ===

--- Quorum Votes Summary from (latest node reconfiguration) ---

            Needed   Present   Possible
            ------   -------   --------
            2        3         3


--- Quorum Votes by Node (current status) ---

Node Name       Present       Possible       Status
---------       -------       --------       ------
v1              1             1              Online
v2              1             1              Online


--- Quorum Votes by Device (current status) ---

Device Name       Present      Possible      Status
-----------       -------      --------      ------
d3                1            1             Online

Then, let us stop v2 first, then v1. If trying to start up v2, it is going to wait forever.

NOTICE: CMM: Node v1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node v2 (nodeid = 2) with votecount = 1 added.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 1.
NOTICE: clcomm: Adapter bge3 constructed
NOTICE: clcomm: Adapter bge2 constructed
NOTICE: CMM: Node v2: attempting to join cluster.
...
Jan  2 16:33:19 v2 cl_runtime: NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.

Let us check the reason of hanging.

root@v2# egrep -i 'nodes...name|key' /etc/cluster/ccr/global/infrastructure
cluster.nodes.1.name    v1
cluster.nodes.1.properties.quorum_resv_key      0x4DF9565B00000001
cluster.nodes.2.name    v2
cluster.nodes.2.properties.quorum_resv_key      0x4DF9565B00000002
root@v2# egrep -i 'quorum_dev' /etc/cluster/ccr/global/infrastructure
cluster.quorum_devices.1.name   d3
cluster.quorum_devices.1.state  enabled
cluster.quorum_devices.1.properties.votecount   1
cluster.quorum_devices.1.properties.gdevname    /dev/did/rdsk/d3s2
cluster.quorum_devices.1.properties.path_1      enabled
cluster.quorum_devices.1.properties.path_2      enabled
cluster.quorum_devices.1.properties.access_mode scsi2
cluster.quorum_devices.1.properties.type        shared_disk
root@v2# /usr/cluster/lib/sc/pgre  -c pgre_inkeys -d /dev/did/rdsk/d3s2 
key[0]=0x4df9565b00000001.

OK, so v1's key is there, blocking our way to boot up in cluster mode.
I might be an obvious solution to wipe this key out with pgre -c pgre_scrub , but that would end up with the same results: if no keys in the quorum device, we should wait for the other node to have the operational quourum. So what to do then. Edit the CCR, of course B)

root@v2$ reboot -- -x
root@v2$ cd /etc/cluster/ccr/global/
root@v2$ vi infrastructure

First enable install mode

cluster.name    test
cluster.state   enabled
cluster.properties.cluster_id   0x4DF9565B
cluster.properties.installmode  enabled

Set the vote count of v1 to 0.

cluster.nodes.1.name    v1
cluster.nodes.1.state   enabled
cluster.nodes.1.properties.private_hostname     clusternode1-priv
cluster.nodes.1.properties.quorum_vote  0

And finally, remove any reference to quorum device.
Do not forget to update the checksum in the file, otherwise the node end up saying "Corrupted CCR".

root@v2# /usr/cluster/lib/sc/ccradm recover -o infrastructure
root@v2# reboot 

After reboot, v2 comes up as a single cluster node. The quorum votes are like the following.

root@v2# clq status

=== Cluster Quorum ===

--- Quorum Votes Summary from (latest node reconfiguration) ---

            Needed   Present   Possible
            ------   -------   --------
            1        1         1


--- Quorum Votes by Node (current status) ---

Node Name       Present       Possible       Status
---------       -------       --------       ------
v1              0             0              Offline
v2              1             1              Online

So are we there yet?

bash-3.00# clrs status

=== Cluster Resources ===

Resource Name       Node Name      State        Status Message
-------------       ---------      -----        --------------
test-lh             v1             Offline      Offline
                    v2             Online       Online - LogicalHostname online.

Our dummy resource is up, and running. That's fine. But can I configure an other one?


bash-3.00# clrslh create -g test-rg -h test-ip2 test-lh2
clrslh:  v1 not a cluster member

It seems we fail to validate the existence of the hostname test-ip2 on the other side (v1).
Until v1 is repaired, we have to postpone any change in the configuration. So really?
Well, if repairing v1 seems to be a long run, we might choose to wipe the entire config of v1, so end up in a single node cluster.

bash-3.00# clnode clear v1
clnode:  Node "v1" is still in use by resource group "test-rg".

Well, that's not so easy. First you have to clear the definitions of v1 from all shared disk paths, interconnects, services, resource groups, whatever.... That can be long, but if no other choose....

But let us see, how to get back, if v1 finally repaired.

=== Cluster Quorum ===

--- Quorum Votes Summary from (latest node reconfiguration) ---

            Needed   Present   Possible
            ------   -------   --------
            1        1         1


--- Quorum Votes by Node (current status) ---

Node Name       Present       Possible       Status
---------       -------       --------       ------
v1              0             0              Online
v2              1             1              Online

To get out of install mode, simply define a quorum device, the simply the same way, as in the "One node at once" type of installation of cluster.

bash-3.00# clq add d3

Jan  2 17:05:21 v2 cl_runtime: NOTICE: CMM: Cluster members: v1 v2.
Jan  2 17:05:21 v2 cl_runtime: NOTICE: CMM: node reconfiguration #5 completed.

bash-3.00# clq reset 

Jan  2 17:05:29 v2 cl_runtime: NOTICE: CMM: Votecount changed from 0 to 1 for node v1.
Jan  2 17:05:29 v2 cl_runtime: NOTICE: CMM: Cluster members: v1 v2.
Jan  2 17:05:29 v2 cl_runtime: NOTICE: CMM: node reconfiguration #6 completed.
Jan  2 17:05:30 v2 cl_runtime: NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d3s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
Jan  2 17:05:30 v2 cl_runtime: NOTICE: CMM: Registered key on and acquired quorum device 1 (gdevname /dev/did/rdsk/d3s2).
Jan  2 17:05:30 v2 cl_runtime: NOTICE: CMM: Quorum device /dev/did/rdsk/d3s2: owner set to node 2.
Jan  2 17:05:30 v2 cl_runtime: NOTICE: CMM: Cluster members: v1 v2.
Jan  2 17:05:30 v2 cl_runtime: NOTICE: CMM: node reconfiguration #7 completed.
Jan  2 17:05:31 v2 cl_runtime: NOTICE: CMM: Quorum device /dev/did/rdsk/d3s2: owner set to node 2.

So, finally.

== Cluster Quorum ===

--- Quorum Votes Summary from (latest node reconfiguration) ---

            Needed   Present   Possible
            ------   -------   --------
            2        3         3


--- Quorum Votes by Node (current status) ---

Node Name       Present       Possible       Status
---------       -------       --------       ------
v1              1             1              Online
v2              1             1              Online


--- Quorum Votes by Device (current status) ---

Device Name       Present      Possible      Status
-----------       -------      --------      ------
d3                1            1             Online

Running mysql instances of multiple versions on the same machine

by Attila Szlovencsák on 04.20.2010

Running multiple instances of the same binary of mysql is not a big deal. However trying to install and run mysql 5.0, 5.1 and 5.5 on the same machine... that's somewhat more tricky.
Mysql documentation does not give an exact way how to do that (AFAIK), only
doc1 and
doc2
is available as reference.

The first problem is around the installation. I you already have and rpm based install of one mysql version, trying to install an other one results in conflict.

# rpm -qa |grep -i mysql
MySQL-client-percona-highperf-5.0.87-b20.29.rhel5
MySQL-server-percona-highperf-5.0.87-b20.29.rhel5
MySQL-shared-percona-highperf-5.0.87-b20.29.rhel5
rpm -i MySQL-shared-community-5.1.42-0.rhel5.x86_64.rpm 
	package MySQL-shared-community-5.1.42-0.rhel5.x86_64 is already installed
	file /usr/lib64/libmysqlclient.so from install of MySQL-shared-community-5.1.42-0.rhel5.x86_64 conflicts with file from package MySQL-shared-percona-highperf-5.0.87-b20.29.rhel5.x86_64
	file /usr/lib64/libmysqlclient_r.so from install of MySQL-shared-community-5.1.42-0.rhel5.x86_64 conflicts with file from package MySQL-shared-percona-highperf-5.0.87-b20.29.rhel5.x86_64

So we need to install them to an alternate location.

rpm -i --relocate /=/opt/mysql_5.1.42/  MySQL-client-community-5.1.42-0.rhel5.x86_64.rpm
rpm -i --relocate /=/opt/mysql_5.1.42/ MySQL-shared-community-5.1.42-0.rhel5.x86_64.rpm
rpm -i --relocate /=/opt/mysql_5.1.42/ --noscripts MySQL-server-community-5.1.42-0.rhel5.x86_64.rpm

The last command's --noscript option is because the post install scripts in the package, responsible to install the basic tables of mysql (mysql_install_db) are not affected by the --relocate option, so they fail as well. So let us do the job of the post install script by hand.

mkdir -p /opt/mysql_5.1.42/data
/opt/mysql_5.1.42/usr/bin/mysql_install_db --datadir=/opt/mysql_5.1.42/data --basedir=/opt/mysql_5.1.42/usr/
ln -s /opt/mysql_5.1.42/etc/init.d/mysql /etc/init.d/mysql_5.1.42
chown -R mysql:mysql /opt/mysql_5.1.42/data

Also we define a new defaults file for this install: /opt/mysql_5.1.42/my.cnf

[mysqld]
basedir=/opt/mysql_5.1.42/usr/
datadir=/opt/mysql_5.1.42/data
socket=/opt/mysql_5.1.42/data/mysql.sock
user=mysql
language=/opt/mysql_5.1.42/usr/share/mysql/english/
port=3307

[mysqld_safe]
log-error=/var/log/mysqld_5.1.42.log
pid-file=/opt/mysql_5.1.42/data/mysqld_5.1_42.pid
ledir=/opt/mysql_5.1.42/usr/sbin

Obviously, we redefined the base and data directories, and also we created new socket, port, and pid file to avoid collisions. The later two new options requires some explaination. The language option shows the location of the errmsg.sys. If not not the correct file is loaded, the server will cry about "the incorrect number of lines" (and we have fake error messages as well :) ).

The option ledir is a hint for mysqld_safe for looking for mysqld-bin. If not defined, our script will start the mysql server at the "default" location: /usr/sbin/mysqld.

So let's have a try with the new instance of mysql.

#/opt/mysql_5.1.42/usr/sbin/mysqld  --defaults-file=/opt/mysql_5.1.42/my.cnf

#mysql -uroot  --port=3307 --host=127.0.0.1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.42-community MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Starting the mysql alone is nice, but using mysqld_safe or rather the rc script mysql.server is a bigger headache..
A big problem with the mysql.server rc script and the mysqld_safe script, that they use somewhat hardcoded values, and also use the my_print_defaults program to gather config info. The resulting algorithm to find the final set of attributes like socket,port, pid-file is so complex, that is hard to define our alternate path for mysql ( /opt/mysql_5.1.42 in our case) in an elegant way. So take a shortcut :) The mysql.server script is to be modified at two points. The line containing:

 parse_server_arguments `$print_defaults $extra_args mysqld server mysql_server mysql.server`

has to be replaced with the following 3 lines

 conf=/opt/mysql_5.1.42/my.cnf
 print_defaults="/opt/mysql_5.1.42/usr/bin/my_print_defaults" 
 parse_server_arguments `$print_defaults --defaults-file=$conf $extra_args mysqld server mysql_server mysql.server`

and also the call for mysqld_safe is to be modified:

 $bindir/mysqld_safe --defaults-file=/opt/mysql_5.1.42/my.cnf --datadir=$datadir --pid-file=$server_pid_file $other_args >/dev/null 2>&1 &

(it is important the --defaults-file option to be first in the row, otherwise it is ignored :( )

With these changes:

# /etc/init.d/mysql_5.1.42 start
Starting MySQL.                                            [  OK  ]

# mysql -uroot  --port=3307 --host=127.0.0.1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.42-community MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

And here we have our processes:

 6796 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/host20.terem4.hu.pid
 6833 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/host20.terem4.hu.pid --skip-external-locking --socket=/var/lib/mysql/mysql.sock
 6835 logger -p daemon.err -t mysqld_safe -i -t mysqld
13742 /bin/sh /opt/mysql_5.1.42/usr//bin/mysqld_safe --defaults-file=/opt/mysql_5.1.42/my.cnf --datadir=/opt/mysql_5.1.42/var/lib/mysql --pid-file=/opt/mysql_5.1.42/data/host20.terem4.hu.pid
13839 /opt/mysql_5.1.42/usr/sbin/mysqld --defaults-file=/opt/mysql_5.1.42/my.cnf --basedir=/opt/mysql_5.1.42/usr/ --datadir=/opt/mysql_5.1.42/var/lib/mysql --user=mysql --log-error=/var/log/mysqld_5.1.42.log --pid-file=/opt/mysql_5.1.42/data/host20.terem4.hu.pid --socket=/opt/mysql_5.1.42/data/mysql.sock.5.1.42 --port=3307

Mysql lock debugging... find the locker.

by Attila Szlovencsák on 04.11.2010

A common problem for transactional databases to identify what some transactions/thread waiting for. With MYISAM, where we could use table locks only, that is quite easy. For innodb, with row (or rather page) locks it is much harder. So let's see:

Our small example table comes from the Mysql Self Study Guide example database.

mysql> prompt #\u (\R:\m:\s \d)> 
PROMPT set to '\u (\R:\m:\s \d)> '
#atis (14:00:30 world)> show create table City;

 City  | CREATE TABLE `City` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`)
) ENGINE=MyISAM AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1 |
#atis (14:00:44 world)> lock tables City read;
Query OK, 0 rows affected (0.00 sec)
#jozsi (14:01:03 world)> lock tables City write;
(thread is locked here)
#root (14:01:34 world)> show processlist;
+----+------+-----------+-------+---------+------+--------+------------------------+
| Id | User | Host      | db    | Command | Time | State  | Info                   |
+----+------+-----------+-------+---------+------+--------+------------------------+
| 16 | jozsi | localhost | world | Query   |  102 | Locked | lock tables City write |
| 17 | atis | localhost | world | Sleep   |   122 | NULL   | NULL       |
+----+------+-----------+-------+---------+------+--------+------------------------+
2 rows in set (0.00 sec)

A rarely known feature of mysql is mysqladmin debug. Nice stuff, gives you info mainly of MYISAM engine, an also about locking. (the debug info is dumped into the mysql error log )

lock: 0x7fda84007988: write_wait read
write_wait: 0x7fda840080b0 (16:12);
read      : 0x7fda8800ad70 (17:5);
...
Thread database.table_name          Locked/Waiting        Lock_type

16      world.City                  Waiting - write       High priority write lock
17      world.City                  Locked - read         Read lock without concurrent inserts
...

It seems to be obvious, that thread 17 is the bad guy, so it is the be killed to let the other threads proceed. Unfortunately, for MYISAM, there is no mechanism to detect long waiting threads, so nor the locker is killed, nor the locked one is aborted. For innodb, this problem is solved, so the locked threads are aborted after a innodb_lock_wait_timeout, but identifying the locker is much harder

#atis (15:13:20 world)>alter table City engine=innodb;#atis (15:13:29 world)>set autocommit=0;

#atis (15:14:22 world)>start transaction;


#atis (15:21:40 world)>select * from City where id=1 for update;
+----+-------+-------------+----------+------------+
| ID | Name  | CountryCode | District | Population |
+----+-------+-------------+----------+------------+
|  1 | Kabul | AFG         | Kabol    |    1780000 |
+----+-------+-------------+----------+------------+
1 row in set (0.00 sec)
#jozsi (14:28:03 world)> update City set Population=id where id=1;
(locked)
root@localhost (none)>show processlist;
+----+-------+-----------+-------+---------+------+----------+----------------------------------------------------+
| Id | User  | Host      | db    | Command | Time | State    | Info                                               |
+----+-------+-----------+-------+---------+------+----------+----------------------------------------------------+
|  2 | atis  | localhost | world | Sleep   |  239 |          | NULL                                               |
|  5 | jozsi | localhost | world | Query   |  237 | Updating | update City set Population=Population+1 where id=1 |
...

Let's see the info provided by the mysqladmin debug.

lock: 0x7fb52800ba40: write
write     : 0x7fb52c00b550 (5:6);

Thread database.table_name          Locked/Waiting        Lock_type

5       world.City                  Locked - write        High priority write lock

The locked thread (thread 5) is still visible here, but I can't find the locker (thread 2). A further investigation is required. Fortunately, INNODB provides an engine status info, with a transactional section.

mysql> show innodb status \G
....
---TRANSACTION 0 504327, ACTIVE 20 sec, process no 2405, OS thread id 140416321279760 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368, 1 row lock(s)
MySQL thread id 5, query id 75 localhost jozsi Updating
update City set Population=Population+1 where id=1
------- TRX HAS BEEN WAITING 20 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504327 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 00000007aef9; asc       ;; 2: len 7; hex 8000000c4b0110; asc     K  ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul                         ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol               ;; 6: len 4; hex 801b2920; asc   ) ;;

------------------
---TRANSACTION 0 504326, ACTIVE 22 sec, process no 2405, OS thread id 140416321812240
2 lock struct(s), heap size 368, 1 row lock(s)
MySQL thread id 2, query id 74 localhost atis

So we have a page lock, for page 3335, but I have no info of who is holding a exclusive lock on the page. The status should be turned into a debug mode. INNODB provides a somewhat orthodox way for this. We should create a table with a specific name. See in the manual

root@localhost world>create table innodb_lock_monitor (a int) engine=innodb;

After this, the transaction section is gives much more detail.

---TRANSACTION 0 504327, ACTIVE 596 sec, process no 2405, OS thread id 140416321279760 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368, 1 row lock(s)
MySQL thread id 5, query id 75 localhost jozsi Updating
update City set Population=Population+1 where id=1
------- TRX HAS BEEN WAITING 596 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504327 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 00000007aef9; asc       ;; 2: len 7; hex 8000000c4b0110; asc     K  ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul                         ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol               ;; 6: len 4; hex 801b2920; asc   ) ;;

------------------
TABLE LOCK table `world`.`City` trx id 0 504327 lock mode IX
RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504327 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 00000007aef9; asc       ;; 2: len 7; hex 8000000c4b0110; asc     K  ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul                         ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol               ;; 6: len 4; hex 801b2920; asc   ) ;;

---TRANSACTION 0 504326, ACTIVE 598 sec, process no 2405, OS thread id 140416321812240
2 lock struct(s), heap size 368, 1 row lock(s)
MySQL thread id 2, query id 74 localhost atis
TABLE LOCK table `world`.`City` trx id 0 504326 lock mode IX
RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504326 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 00000007aef9; asc       ;; 2: len 7; hex 8000000c4b0110; asc     K  ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul                         ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol               ;; 6: len 4; hex 801b2920; asc   ) ;;

Well, that seems pretty hard to find the appropriate thread/page number pair. We are missing a tool here. And here comes innotop

________________________________ InnoDB Locks _________________________________
CXN        ID  Type    Waiting  Wait   Active  Mode  DB     Table  Index    Ins
localhost   5  RECORD        1  06:23   19:26  X     world  City   PRIMARY     
localhost   5  TABLE         0  06:23   19:26  IX    world  City               
localhost   5  RECORD        1  06:23   19:26  X     world  City   PRIMARY     
localhost   2  TABLE         0  00:00   19:28  IX    world  City               
localhost   2  RECORD        0  00:00   19:28  X     world  City   PRIMARY    

Well, the output is nice, but still we are missing the page numbers... A patch might be required here. :)

Free Blog Themes and Templates
Free Blog Themes and Templates