
Founded in 1995, the Component Soft Ltd has 15 years of experience on Solaris education and support. Our instructors use this site to publicate some nice or nasty features discovered in Solaris, Linux or Mysql.
The problem might be known to all of you. If shut down my nodes one after the other, the last node leaving the cluster (host v1 in our case) should be the first booting up in order the provide CCR consistency.
However, it that node refuses to start up (because of some HW error), I am in trouble: all the other nodes stuck in a "attempting to join cluster", since the quorum key on the quorum device belongs to v1 (the last node).
So did we have to wait since the first one repaired? Not really, here is a solution.
So our starting point is the following. An operational cluster, 3 votes, everything is fine.
bash-3.00# clq status
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
------ ------- --------
2 3 3
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
--------- ------- -------- ------
v1 1 1 Online
v2 1 1 Online
--- Quorum Votes by Device (current status) ---
Device Name Present Possible Status
----------- ------- -------- ------
d3 1 1 Online
Then, let us stop v2 first, then v1. If trying to start up v2, it is going to wait forever.
NOTICE: CMM: Node v1 (nodeid = 1) with votecount = 1 added. NOTICE: CMM: Node v2 (nodeid = 2) with votecount = 1 added. WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 1. NOTICE: clcomm: Adapter bge3 constructed NOTICE: clcomm: Adapter bge2 constructed NOTICE: CMM: Node v2: attempting to join cluster. ... Jan 2 16:33:19 v2 cl_runtime: NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
Let us check the reason of hanging.
root@v2# egrep -i 'nodes...name|key' /etc/cluster/ccr/global/infrastructure cluster.nodes.1.name v1 cluster.nodes.1.properties.quorum_resv_key 0x4DF9565B00000001 cluster.nodes.2.name v2 cluster.nodes.2.properties.quorum_resv_key 0x4DF9565B00000002
root@v2# egrep -i 'quorum_dev' /etc/cluster/ccr/global/infrastructure cluster.quorum_devices.1.name d3 cluster.quorum_devices.1.state enabled cluster.quorum_devices.1.properties.votecount 1 cluster.quorum_devices.1.properties.gdevname /dev/did/rdsk/d3s2 cluster.quorum_devices.1.properties.path_1 enabled cluster.quorum_devices.1.properties.path_2 enabled cluster.quorum_devices.1.properties.access_mode scsi2 cluster.quorum_devices.1.properties.type shared_disk
root@v2# /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d3s2 key[0]=0x4df9565b00000001.
OK, so v1's key is there, blocking our way to boot up in cluster mode.
I might be an obvious solution to wipe this key out with pgre -c pgre_scrub , but that would end up with the same results: if no keys in the quorum device, we should wait for the other node to have the operational quourum. So what to do then. Edit the CCR, of course B)
root@v2$ reboot -- -x
root@v2$ cd /etc/cluster/ccr/global/ root@v2$ vi infrastructure
First enable install mode
cluster.name test cluster.state enabled cluster.properties.cluster_id 0x4DF9565B cluster.properties.installmode enabled
Set the vote count of v1 to 0.
cluster.nodes.1.name v1 cluster.nodes.1.state enabled cluster.nodes.1.properties.private_hostname clusternode1-priv cluster.nodes.1.properties.quorum_vote 0
And finally, remove any reference to quorum device.
Do not forget to update the checksum in the file, otherwise the node end up saying "Corrupted CCR".
root@v2# /usr/cluster/lib/sc/ccradm recover -o infrastructure root@v2# reboot
After reboot, v2 comes up as a single cluster node. The quorum votes are like the following.
root@v2# clq status
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
------ ------- --------
1 1 1
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
--------- ------- -------- ------
v1 0 0 Offline
v2 1 1 Online
So are we there yet?
bash-3.00# clrs status
=== Cluster Resources ===
Resource Name Node Name State Status Message
------------- --------- ----- --------------
test-lh v1 Offline Offline
v2 Online Online - LogicalHostname online.
Our dummy resource is up, and running. That's fine. But can I configure an other one?
bash-3.00# clrslh create -g test-rg -h test-ip2 test-lh2 clrslh: v1 not a cluster member
It seems we fail to validate the existence of the hostname test-ip2 on the other side (v1).
Until v1 is repaired, we have to postpone any change in the configuration. So really?
Well, if repairing v1 seems to be a long run, we might choose to wipe the entire config of v1, so end up in a single node cluster.
bash-3.00# clnode clear v1 clnode: Node "v1" is still in use by resource group "test-rg".
Well, that's not so easy. First you have to clear the definitions of v1 from all shared disk paths, interconnects, services, resource groups, whatever.... That can be long, but if no other choose....
But let us see, how to get back, if v1 finally repaired.
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
------ ------- --------
1 1 1
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
--------- ------- -------- ------
v1 0 0 Online
v2 1 1 Online
To get out of install mode, simply define a quorum device, the simply the same way, as in the "One node at once" type of installation of cluster.
bash-3.00# clq add d3 Jan 2 17:05:21 v2 cl_runtime: NOTICE: CMM: Cluster members: v1 v2. Jan 2 17:05:21 v2 cl_runtime: NOTICE: CMM: node reconfiguration #5 completed. bash-3.00# clq reset Jan 2 17:05:29 v2 cl_runtime: NOTICE: CMM: Votecount changed from 0 to 1 for node v1. Jan 2 17:05:29 v2 cl_runtime: NOTICE: CMM: Cluster members: v1 v2. Jan 2 17:05:29 v2 cl_runtime: NOTICE: CMM: node reconfiguration #6 completed. Jan 2 17:05:30 v2 cl_runtime: NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d3s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3. Jan 2 17:05:30 v2 cl_runtime: NOTICE: CMM: Registered key on and acquired quorum device 1 (gdevname /dev/did/rdsk/d3s2). Jan 2 17:05:30 v2 cl_runtime: NOTICE: CMM: Quorum device /dev/did/rdsk/d3s2: owner set to node 2. Jan 2 17:05:30 v2 cl_runtime: NOTICE: CMM: Cluster members: v1 v2. Jan 2 17:05:30 v2 cl_runtime: NOTICE: CMM: node reconfiguration #7 completed. Jan 2 17:05:31 v2 cl_runtime: NOTICE: CMM: Quorum device /dev/did/rdsk/d3s2: owner set to node 2.
So, finally.
== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
------ ------- --------
2 3 3
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
--------- ------- -------- ------
v1 1 1 Online
v2 1 1 Online
--- Quorum Votes by Device (current status) ---
Device Name Present Possible Status
----------- ------- -------- ------
d3 1 1 Online
Running multiple instances of the same binary of mysql is not a big deal. However trying to install and run mysql 5.0, 5.1 and 5.5 on the same machine... that's somewhat more tricky.
Mysql documentation does not give an exact way how to do that (AFAIK), only
doc1 and
doc2
is available as reference.
The first problem is around the installation. I you already have and rpm based install of one mysql version, trying to install an other one results in conflict.
# rpm -qa |grep -i mysql MySQL-client-percona-highperf-5.0.87-b20.29.rhel5 MySQL-server-percona-highperf-5.0.87-b20.29.rhel5 MySQL-shared-percona-highperf-5.0.87-b20.29.rhel5
rpm -i MySQL-shared-community-5.1.42-0.rhel5.x86_64.rpm package MySQL-shared-community-5.1.42-0.rhel5.x86_64 is already installed file /usr/lib64/libmysqlclient.so from install of MySQL-shared-community-5.1.42-0.rhel5.x86_64 conflicts with file from package MySQL-shared-percona-highperf-5.0.87-b20.29.rhel5.x86_64 file /usr/lib64/libmysqlclient_r.so from install of MySQL-shared-community-5.1.42-0.rhel5.x86_64 conflicts with file from package MySQL-shared-percona-highperf-5.0.87-b20.29.rhel5.x86_64
So we need to install them to an alternate location.
rpm -i --relocate /=/opt/mysql_5.1.42/ MySQL-client-community-5.1.42-0.rhel5.x86_64.rpm rpm -i --relocate /=/opt/mysql_5.1.42/ MySQL-shared-community-5.1.42-0.rhel5.x86_64.rpm rpm -i --relocate /=/opt/mysql_5.1.42/ --noscripts MySQL-server-community-5.1.42-0.rhel5.x86_64.rpm
The last command's --noscript option is because the post install scripts in the package, responsible to install the basic tables of mysql (mysql_install_db) are not affected by the --relocate option, so they fail as well. So let us do the job of the post install script by hand.
mkdir -p /opt/mysql_5.1.42/data /opt/mysql_5.1.42/usr/bin/mysql_install_db --datadir=/opt/mysql_5.1.42/data --basedir=/opt/mysql_5.1.42/usr/ ln -s /opt/mysql_5.1.42/etc/init.d/mysql /etc/init.d/mysql_5.1.42 chown -R mysql:mysql /opt/mysql_5.1.42/data
Also we define a new defaults file for this install: /opt/mysql_5.1.42/my.cnf
[mysqld] basedir=/opt/mysql_5.1.42/usr/ datadir=/opt/mysql_5.1.42/data socket=/opt/mysql_5.1.42/data/mysql.sock user=mysql language=/opt/mysql_5.1.42/usr/share/mysql/english/ port=3307 [mysqld_safe] log-error=/var/log/mysqld_5.1.42.log pid-file=/opt/mysql_5.1.42/data/mysqld_5.1_42.pid ledir=/opt/mysql_5.1.42/usr/sbin
Obviously, we redefined the base and data directories, and also we created new socket, port, and pid file to avoid collisions. The later two new options requires some explaination. The language option shows the location of the errmsg.sys. If not not the correct file is loaded, the server will cry about "the incorrect number of lines" (and we have fake error messages as well
).
The option ledir is a hint for mysqld_safe for looking for mysqld-bin. If not defined, our script will start the mysql server at the "default" location: /usr/sbin/mysqld.
So let's have a try with the new instance of mysql.
#/opt/mysql_5.1.42/usr/sbin/mysqld --defaults-file=/opt/mysql_5.1.42/my.cnf #mysql -uroot --port=3307 --host=127.0.0.1 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.42-community MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
Starting the mysql alone is nice, but using mysqld_safe or rather the rc script mysql.server is a bigger headache..
A big problem with the mysql.server rc script and the mysqld_safe script, that they use somewhat hardcoded values, and also use the my_print_defaults program to gather config info. The resulting algorithm to find the final set of attributes like socket,port, pid-file is so complex, that is hard to define our alternate path for mysql ( /opt/mysql_5.1.42 in our case) in an elegant way. So take a shortcut
The mysql.server script is to be modified at two points. The line containing:
parse_server_arguments `$print_defaults $extra_args mysqld server mysql_server mysql.server`
has to be replaced with the following 3 lines
conf=/opt/mysql_5.1.42/my.cnf print_defaults="/opt/mysql_5.1.42/usr/bin/my_print_defaults" parse_server_arguments `$print_defaults --defaults-file=$conf $extra_args mysqld server mysql_server mysql.server`
and also the call for mysqld_safe is to be modified:
$bindir/mysqld_safe --defaults-file=/opt/mysql_5.1.42/my.cnf --datadir=$datadir --pid-file=$server_pid_file $other_args >/dev/null 2>&1 &
(it is important the --defaults-file option to be first in the row, otherwise it is ignored
)
With these changes:
# /etc/init.d/mysql_5.1.42 start Starting MySQL. [ OK ] # mysql -uroot --port=3307 --host=127.0.0.1 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.42-community MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
And here we have our processes:
6796 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/host20.terem4.hu.pid 6833 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/host20.terem4.hu.pid --skip-external-locking --socket=/var/lib/mysql/mysql.sock 6835 logger -p daemon.err -t mysqld_safe -i -t mysqld 13742 /bin/sh /opt/mysql_5.1.42/usr//bin/mysqld_safe --defaults-file=/opt/mysql_5.1.42/my.cnf --datadir=/opt/mysql_5.1.42/var/lib/mysql --pid-file=/opt/mysql_5.1.42/data/host20.terem4.hu.pid 13839 /opt/mysql_5.1.42/usr/sbin/mysqld --defaults-file=/opt/mysql_5.1.42/my.cnf --basedir=/opt/mysql_5.1.42/usr/ --datadir=/opt/mysql_5.1.42/var/lib/mysql --user=mysql --log-error=/var/log/mysqld_5.1.42.log --pid-file=/opt/mysql_5.1.42/data/host20.terem4.hu.pid --socket=/opt/mysql_5.1.42/data/mysql.sock.5.1.42 --port=3307
A common problem for transactional databases to identify what some transactions/thread waiting for. With MYISAM, where we could use table locks only, that is quite easy. For innodb, with row (or rather page) locks it is much harder. So let's see:
Our small example table comes from the Mysql Self Study Guide example database.
mysql> prompt #\u (\R:\m:\s \d)> PROMPT set to '\u (\R:\m:\s \d)> ' #atis (14:00:30 world)> show create table City; City | CREATE TABLE `City` ( `ID` int(11) NOT NULL AUTO_INCREMENT, `Name` char(35) NOT NULL DEFAULT '', `CountryCode` char(3) NOT NULL DEFAULT '', `District` char(20) NOT NULL DEFAULT '', `Population` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`ID`) ) ENGINE=MyISAM AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1 |
#atis (14:00:44 world)> lock tables City read; Query OK, 0 rows affected (0.00 sec)
#jozsi (14:01:03 world)> lock tables City write; (thread is locked here)
#root (14:01:34 world)> show processlist; +----+------+-----------+-------+---------+------+--------+------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------+-------+---------+------+--------+------------------------+ | 16 | jozsi | localhost | world | Query | 102 | Locked | lock tables City write | | 17 | atis | localhost | world | Sleep | 122 | NULL | NULL | +----+------+-----------+-------+---------+------+--------+------------------------+ 2 rows in set (0.00 sec)
A rarely known feature of mysql is mysqladmin debug. Nice stuff, gives you info mainly of MYISAM engine, an also about locking. (the debug info is dumped into the mysql error log )
lock: 0x7fda84007988: write_wait read write_wait: 0x7fda840080b0 (16:12); read : 0x7fda8800ad70 (17:5); ... Thread database.table_name Locked/Waiting Lock_type 16 world.City Waiting - write High priority write lock 17 world.City Locked - read Read lock without concurrent inserts ...
It seems to be obvious, that thread 17 is the bad guy, so it is the be killed to let the other threads proceed. Unfortunately, for MYISAM, there is no mechanism to detect long waiting threads, so nor the locker is killed, nor the locked one is aborted. For innodb, this problem is solved, so the locked threads are aborted after a innodb_lock_wait_timeout, but identifying the locker is much harder
#atis (15:13:20 world)>alter table City engine=innodb;#atis (15:13:29 world)>set autocommit=0; #atis (15:14:22 world)>start transaction; #atis (15:21:40 world)>select * from City where id=1 for update; +----+-------+-------------+----------+------------+ | ID | Name | CountryCode | District | Population | +----+-------+-------------+----------+------------+ | 1 | Kabul | AFG | Kabol | 1780000 | +----+-------+-------------+----------+------------+ 1 row in set (0.00 sec)
#jozsi (14:28:03 world)> update City set Population=id where id=1; (locked)
root@localhost (none)>show processlist; +----+-------+-----------+-------+---------+------+----------+----------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------+-----------+-------+---------+------+----------+----------------------------------------------------+ | 2 | atis | localhost | world | Sleep | 239 | | NULL | | 5 | jozsi | localhost | world | Query | 237 | Updating | update City set Population=Population+1 where id=1 | ...
Let's see the info provided by the mysqladmin debug.
lock: 0x7fb52800ba40: write write : 0x7fb52c00b550 (5:6); Thread database.table_name Locked/Waiting Lock_type 5 world.City Locked - write High priority write lock
The locked thread (thread 5) is still visible here, but I can't find the locker (thread 2). A further investigation is required. Fortunately, INNODB provides an engine status info, with a transactional section.
mysql> show innodb status \G .... ---TRANSACTION 0 504327, ACTIVE 20 sec, process no 2405, OS thread id 140416321279760 starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 368, 1 row lock(s) MySQL thread id 5, query id 75 localhost jozsi Updating update City set Population=Population+1 where id=1 ------- TRX HAS BEEN WAITING 20 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504327 lock_mode X locks rec but not gap waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 00000007aef9; asc ;; 2: len 7; hex 8000000c4b0110; asc K ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol ;; 6: len 4; hex 801b2920; asc ) ;; ------------------ ---TRANSACTION 0 504326, ACTIVE 22 sec, process no 2405, OS thread id 140416321812240 2 lock struct(s), heap size 368, 1 row lock(s) MySQL thread id 2, query id 74 localhost atis
So we have a page lock, for page 3335, but I have no info of who is holding a exclusive lock on the page. The status should be turned into a debug mode. INNODB provides a somewhat orthodox way for this. We should create a table with a specific name. See in the manual
root@localhost world>create table innodb_lock_monitor (a int) engine=innodb;
After this, the transaction section is gives much more detail.
---TRANSACTION 0 504327, ACTIVE 596 sec, process no 2405, OS thread id 140416321279760 starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 368, 1 row lock(s) MySQL thread id 5, query id 75 localhost jozsi Updating update City set Population=Population+1 where id=1 ------- TRX HAS BEEN WAITING 596 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504327 lock_mode X locks rec but not gap waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 00000007aef9; asc ;; 2: len 7; hex 8000000c4b0110; asc K ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol ;; 6: len 4; hex 801b2920; asc ) ;; ------------------ TABLE LOCK table `world`.`City` trx id 0 504327 lock mode IX RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504327 lock_mode X locks rec but not gap waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 00000007aef9; asc ;; 2: len 7; hex 8000000c4b0110; asc K ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol ;; 6: len 4; hex 801b2920; asc ) ;; ---TRANSACTION 0 504326, ACTIVE 598 sec, process no 2405, OS thread id 140416321812240 2 lock struct(s), heap size 368, 1 row lock(s) MySQL thread id 2, query id 74 localhost atis TABLE LOCK table `world`.`City` trx id 0 504326 lock mode IX RECORD LOCKS space id 0 page no 3335 n bits 248 index `PRIMARY` of table `world`.`City` trx id 0 504326 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 00000007aef9; asc ;; 2: len 7; hex 8000000c4b0110; asc K ;; 3: len 30; hex 4b6162756c20202020202020202020202020202020202020202020202020; asc Kabul ;...(truncated); 4: len 3; hex 414647; asc AFG;; 5: len 20; hex 4b61626f6c202020202020202020202020202020; asc Kabol ;; 6: len 4; hex 801b2920; asc ) ;;
Well, that seems pretty hard to find the appropriate thread/page number pair. We are missing a tool here. And here comes innotop
________________________________ InnoDB Locks _________________________________ CXN ID Type Waiting Wait Active Mode DB Table Index Ins localhost 5 RECORD 1 06:23 19:26 X world City PRIMARY localhost 5 TABLE 0 06:23 19:26 IX world City localhost 5 RECORD 1 06:23 19:26 X world City PRIMARY localhost 2 TABLE 0 00:00 19:28 IX world City localhost 2 RECORD 0 00:00 19:28 X world City PRIMARY
Well, the output is nice, but still we are missing the page numbers... A patch might be required here.