Cluster Restart Crib


Restarting the Cluster

Emergency Restart

Use the following method when something has gone wrong with your RAC enironment, and the decision has been taken to restart everything (i.e. database, ASM, the whole cluster) ASAP.

Down

Log onto node 03 as root (03 is used as an example).

Run standard checks beforehand:

/u01/app/12.1.0/grid/bin/crsctl check cluster

/u01/app/12.1.0/grid/bin/crsctl status resource –t

Next, bring down the cluster as follows:

If you want to shut down the cluster on both nodes, run the following:

/u01/app/12.1.0/grid/bin/crsctl stop cluster –all –f

If you want to shut down the cluster on a single node (03, in this example), run the following:

/u01/app/12.1.0/grid/bin/crsctl stop cluster -n db03 -f

(NOTE: we are using the ‘force’ option because of the fact that there is something wrong with the environment at present, and so need to bring it down ASAP)

Confirm everything has come down by running standard checks again:

/u01/app/12.1.0/grid/bin/crsctl check cluster –all

/u01/app/12.1.0/grid/bin/crsctl status resource –t

Doesn’t hurt to confirm DB and ASM are definitely down:

(especially if we are restarting the cluster because of issues)

ps –ef | grep smon

You can also bring down HA (High Availability) at this point:

# crsctl stop has

NOTE: this command will stop HA on the local node only.

Therefore - depending on your situation (you might be doing the restart only on one node, for example) – you will need to log onto node 04 to stop it there.

Up

Start everything back up again:

If you took down HA, then simply restarting this will bring up the rest of the cluster:

# crsctl start has

If you didn’t take down HA, then run this command to bring up the clusterware:

/u01/app/12.1.0/grid/bin/crsctl start cluster –all

Confirm everything has come up again:

crsctl check cluster –all

crsctl status resource –t

Alternative Commands


NOTE: You can also use the following commands to restart CRS, which achieve the same result as the above “crsctl start/stop cluster” commands:

crsctl check crs

crsctl stop crs

crsctl start crs

Controlled Restart

If you are taking down the cluster as part of some maintenance activity (i.e. not an emergency scenario), then it’s better to take the cluster components down using srvctl commands as described in this section.

From [1] – taking the whole cluster down in one go using CRSCTL as in the previous section "can lead to the database instances being stopped similar to shutdown abort, which requires an instance recovery on startup. If you use SRVCTL to stop the database instances manually before stopping the cluster, then you can prevent a shutdown abort, but this requires that you manually restart the database instances after restarting Oracle Clusterware."

Therefore, it is better to use the method below:

Down

Database

Log onto node 03 as user ‘oracle’.

srvctl status database -db ism_dbname

srvctl stop database -db ism_dbname

srvctl status database -db ism_dbname

Alternatively, you can do this one node at a time (e.g. if one node is already down, as we have currently with rpap2)

srvctl status instance -db ism_dbname –instance rpap1

srvctl stop instance -db ism_dbname –instance rpap1

srvctl status instance -db ism_dbname –instance rpap1

-- only if you’re doing both nodes:

srvctl status instance -db ism_dbname –instance rpap2

srvctl stop instance -db ism_dbname –instance rpap2

srvctl status instance -db ism_dbname –instance rpap2

ASM

Log onto node 03 as user ‘grid’.

srvctl status asm -n db03

srvctl stop asm -n db03

srvctl status asm -n db03

-- only if you’re doing both nodes:

srvctl status asm -n db04

srvctl stop asm -n db04

srvctl status asm -n db04

Nodeapps

Still logged on as ‘grid’:

srvctl status nodeapps -node db03

srvctl stop nodeapps -node db03

srvctl status nodeapps -node db03

-- only if you’re doing both nodes:

srvctl status nodeapps -node db04

srvctl stop nodeapps -node db04

srvctl status nodeapps -node db04

CRS

Log onto node 03 as ‘root’.

/u01/app/12.1.0/grid/bin/crsctl check cluster –all

/u01/app/12.1.0/grid/bin/crsctl status resource –t

If you are taking down the clusterware on both nodes:

/u01/app/12.1.0/grid/bin/crsctl stop cluster –all

If you are taking down the clusterware on one node only:

/u01/app/12.1.0/grid/bin/crsctl stop cluster -n db03

Check everything is now down.

/u01/app/12.1.0/grid/bin/crsctl check cluster –all

/u01/app/12.1.0/grid/bin/crsctl status resource –t

HA

You can also bring down HA (High Availability) at this point:

# crsctl stop has

NOTE: this command will stop HA on the local node only.

Therefore - depending on your situation (you might be doing the restart only on one node, for example) – you will need to log onto node 04 to stop it there.

Up

In order to bring everything up again, follow sthe steps in the “Emergency Restart” (Up) section. The steps will be the same.

Reference

Oracle® Real Application Clusters
Administration and Deployment Guide
12c Release 1 (12.1)
E48838-09
 



Additional - shutting down one node.

ps -ef | grep pmon

This will show the instances running on this machine/node

srvctl status database -db datadb -v

This will show the status of the database named and this instances associated with that database and which server said instance is running on.

srvctl status instance -db datadb -v

This will show the status of the instances running on the named database and any services associated with that instance

srvctl stop service -db ods -service ods_1 -instance ods1

This will stop the service named ods_1 associated with instance ods1 which uses database ods

srvctl stop service -db datadb -service prodcat, datadb_1 -instance datadb1

NB - you may want to disable 'has' e.g. for patching, otherwise the services will come back up following a reboot.

This will stop the list of services linked to instance datadb1 which uses databases datadb





No comments:

Post a Comment