Upgraded GFS2 Cluster Tools from 2.2 to 3.0.4
With a few words of warning, we upgraded one of our clusters from 2.2 to 3.0.4. While this is normally a seamless project, it needed to be coordinated with both storage nodes in the cluster since the changes from 2.2 to 3.0 in openais were incompatible. Some minor changes to the cluster config file were needed which results in a cleaner file, and, a new dependency for rgmanager was added for the upgrade to 3.0.
This meant some downtime while openais was upgraded. Since we run behind a pair of load balancers, we were able to shut down the first filesystem, disconnect it from cman, upgrade one side, shut off the services on the other, bring this side up, bring up services, then upgrade the second node.
While this should have worked, cman on the primary node had no problem, but the secondary node refused to start dlm_controld.
Dec 10 12:29:20 dlm_controld dlm_controld 3.0.4 started Dec 10 12:29:30 dlm_controld cannot find device /dev/misc/lock_dlm_plock with minor 58
For some odd reason, lock_dlm_plock was created in /dev rather than /dev/misc after the udev upgrade. Moving it into place allowed cman to start on the second node, and, allowed the cluster to run in non-degraded mode.
Why lock_dlm_plock was in the wrong place on one node and in the correct place on the other node, I’m not sure. I think prior to rgmanager being installed, the init script for cman didn’t stop when dlm couldn’t be loaded, and since the /dev/misc folder hadn’t been created, it created the lock file in /dev. Subsequent restarts of the machine have resulted in it coming up without an issue, so, it appears to be an issue somewhere in one of the startup scripts.