RAC - Healthchecking the Startup of Oracle for a given RAC LPAR

Posted by: Oracle DBA World Posted date: July 28, 2011 / comment : 0

RAC - Healthchecking the Startup of Oracle for a given RAC LPAR

The objective/scope here is to healthcheck (& complete as necessary) the startup of all normal running Oracle-related aspects for a given production RAC LPAR.

This includes the minimum/typical processes post-LPAR-startup that should be running under "oracle" username to support each running local database-instance.

This order of tasks is based on the RAC Administration manual, & tallies with several procedures/experiences found documented on the internet & takes into account our own configurations/running-services.

This procedure has been tested & used in a 10.2.0.3.0 production-environment.

0. The assumption is made here that a production LPAR boot or reboot has just taken place......in which case (by design of our production RAC environments) everything should startup automatically - including:

Clusterware (ie CRS) & general services

ASM instance

Database instances (& associated services, as applicable)

Listeners

Grid Control agent

1. So, when tasked with healthchecking the state of things after the boot/reboot, login under ORACLE username & do the following to check that CRS is up & running & ALL instances have started up......

crsctl check crs

(wait for confirmation that all appears healthy)

more /etc/oratab

ps -ef | grep -i pmon

......& the resulting list of processes should eventually comprise all instances listed in “/etc/oratab” (including the ASM instance).
Note1: the RAC auto-start is NOT dependent on “/etc/oratab” having “Y” value set for any instance to auto-start......rather, the RAC Repository dictates what auto-starts.
Note2: this is very rare, but there may be an instance that fails to auto-start because it is not using the “SPFILE” method for startup-parameters......in which case, the instance must be manually started using the normal SQLPLUS method......&, as at June 2009, this only seems to apply to RACxxxThiru’s “CTC” database instances (which will be changed in due course to “SPFILE” method).

If you find that an instance hasn’t auto-started (but does use “SPFILE” method), then use the following command syntax to start it:

srvctl start instance -d DBNAME -i INSNAME

(eg srvctl start instance -d ThiruDB_GEP -i ThiruDBGEP2)

Repeat that SRVCTL command for each instance that hasn’t started, but should be.

2. Once all instances have started, check all alert-logs for any issues via:

more /u01/app/oracle/admin/*/bdump/alert_*.log

Note: page back to the time of instance-startup & work through to double-check.

3. Now ensure the Grid Control agent is running:

ps -ef | grep -i emagent

There should be a process running under “oracle” username running the program “/u01/app/oracle/product/10.2.0/agent10g/bin/emagent”. So, if not then do:

. oraenv

(respond with “agent”)
(nb: sometimes the SID must be entered in uppercase - depending on the entry in "/etc/oratab" file - so, if the utility prompts for the ORACLE_HOME then abort & retry using the opposite case)

/u01/app/oracle/product/10.2.0/agent10g/bin/emctl start agent

4. Now ensure the local listeners are running:

ps -ef | grep -i lsnr

There should be one process running the Oracle-database listener under “oracle” username via the program “/u01/app/oracle/product/10.2.0/db_1/tnslsnr” & (typically) with a listener named in the form “LISTENER_RACNAME_FULLNODENAME”. So, if not then do (& note that the listener names are case-sensitive):

lsnrctl start LISTENER_RACNAME_FULLNODENAME

(eg lsnrctl start LISTENER_RAC2Thiru_P13504Thiru020)

There should be one process running the Oracle-ASM listener under “oracle” username via the program “/u01/app/oracle/product/10.2.0/asm/tnslsnr” & (typically) with a listener named in the form “ASM_LISTENER_FULLNODENAME”. So, if not then do (& note that the listener names are case-sensitive):

lsnrctl start ASM_LISTENER_FULLNODENAME

(eg lsnrctl start ASM_LISTENER_P13504Thiru020)

Finally, check the listener status via:

lsnrctl status ASM_LISTENER_FULLNODENAME

lsnrctl status LISTENER_RACNAME_FULLNODENAME

5. Now check in Grid Control how much space is currently in-use in the cluster's DATADG & FLASHDG disk-groups & ensure it’s sufficient.

6. At this point, we’ve completed basic checks that everything that should be running is indeed running.

Now we need to perform a check that all “services” are running on their correct LPARs (to ensure that the workload is correctly “balanced” across the cluster).
Note: when an instance is taken down, some listener-services specific (as applicable) to it may have been automatically moved across to an alternative LPAR......however, the services will NOT automatically move back to their normal instance - hence, this check/task must be undertaken.

While under ORACLE username, first record the state of things as they stand (& the assumption is that the healthcheck being undertaken is part of a change......hence the file-name suffix):

crs_stat > $HOME/crs_stat.post_chxxxxxx

ps -ef > $HOME/psminusef.post_chxxxxxx

df -g > $HOME/dfminusg.post_chxxxxxx

Now check the state of non-ASM database/instance services:

ps -ef | grep -i pmon

srvctl status service -d DBNAME

(eg srvctl status service -d CCTM)

nb: if nothing is returned, then skip to the next-listed instance-name as this means there are no specific listener-services applicable to the database/instance

srvctl config service -d DBNAME

This command shows where the service should be located and a 1st preference. If possible follow this to relocate the services using the ‘srvctl relocate service’ command detailed below.

Alternatively, if the SRVCTL STATUS output includes a service-name that is clearly particular to this LPAR (by implication of the naming convention used for the listed services), but it’s shown as running on another instance in the cluster, then relocate that service to this LPAR now as follows:

srvctl relocate service -d DBNAME -s SERVICE -i CURRENT -t TARGET

(eg srvctl relocate service -d CCTM -s WAS_CCTM_02 -i CCTM4 -t CCTM2)

That example causes “WAS_CCTM_02” service to move from CCTM4 instance/lpar to CCTM2 instance/lpar......on the basis that the SRVCTL STATUS output showed:

“Service WAS_CCTM_02 is running on instance(s) CCTM4”

Repeat those two SRVCTL commands (as necessary) for each non-ASM instance listed by the “ps -ef” command.

7. Now check to see where "tsmorasched" & "rmarchivelogs" services are currently running:

NB: we normally run these on the cluster node whose instance-names end with “1”.

crs_stat

(tip: they'll typically be listed at the end of the output)

So, if they are normally supposed to be running on this LPAR but they are currently elsewhere, then they must be relocated back as follows (where FULLNODENAME = this LPAR’s full name):

crs_relocate rmarchivelogs -c FULLNODENAME

(eg crs_relocate rmarchivelogs -c P13704Thiru024)

AND

crs_relocate tsmorasched -c FULLNODENAME

When done, double-check via "crs_stat"......& make sure they're back on this LPAR.

8. If the healthcheck at this time is part of a change that has just rebooted (ie both shutdown & restarted) an LPAR......& if during the shutdown phase of the change you also recorded the state of how things
stood (just as you did earlier in this procedure for post-reboot), then action the following final check as an added comfort-factor.
NB: if the above is not applicable, then just use “crs_stat” & ensure all’s ok.

So, compare the contents of......

$HOME/crs_stat.pre_chxxxxxx

&

$HOME/crs_stat.post_chxxxxxx

......& resolve (ie relocate) any remaining service discrepancies accordingly.

9. At this point, this procedure is complete. If applicable, advise whoever required/requested the healthcheck that all’s ok.

Pages

Oracle DBA World

Ads

Random Posts

Blog Archive

Oracle DBA World

Search This Blog

Blog Archive

My Blog List

Report Abuse

Followers

Lorem 1

Technology

Circle Gallery

Shooting

Racing

News

Lorem 4

RAC - Healthchecking the Startup of Oracle for a given RAC LPAR

About Oracle DBA World

No comments

Leave a Reply

QUERY FOR CPU USAGE

Blogger templates

Popular Posts

Advertisement

About

Advertisement

Advertisement With Us

Popular Posts

Sponsor Advertisement

Video of Day

Join Us

Pages

Random Posts

Blog Archive

Oracle DBA World

Search This Blog

Blog Archive

My Blog List

Report Abuse

Followers

Lorem 1

Technology

Circle Gallery

Shooting

Racing

News

Lorem 4

RAC - Healthchecking the Startup of Oracle for a given RAC LPAR

About Oracle DBA World

Next

Newer Post

Previous

Older Post

No comments

Leave a Reply

QUERY FOR CPU USAGE

Blogger templates

Popular Posts

Advertisement

About

Advertisement

Advertisement With Us

Popular Posts

Sponsor Advertisement

Video of Day

Join Us