RAC - Healthchecking the Startup of Oracle for a given RAC LPAR
The objective/scope here is to healthcheck (& complete as necessary) the startup of all normal running Oracle-related aspects for a given production RAC LPAR.
This includes the minimum/typical processes post-LPAR-startup that should be running under "oracle" username to support each running local database-instance.
This order of tasks is based on the RAC Administration manual, & tallies with several procedures/experiences found documented on the internet & takes into account our own configurations/running-services.
This procedure has been tested & used in a 10.2.0.3.0 production-environment.
0. The assumption is made here that a production LPAR boot or reboot has just taken place......in which case (by design of our production RAC environments) everything should startup automatically - including:
Clusterware (ie CRS) & general services
ASM instance
Database instances (& associated services, as applicable)
Listeners
Grid Control agent
1. So, when tasked with healthchecking the state of things after the boot/reboot, login under ORACLE username & do the following to check that CRS is up & running & ALL instances have started up......
crsctl check crs
(wait for confirmation that all appears healthy)
more /etc/oratab
ps -ef | grep -i pmon
......& the resulting list of processes should eventually comprise all instances listed in “/etc/oratab” (including the ASM instance).
Note1: the RAC auto-start is NOT dependent on “/etc/oratab” having “Y” value set for any instance to auto-start......rather, the RAC Repository dictates what auto-starts.
Note2: this is very rare, but there may be an instance that fails to auto-start because it is not using the “SPFILE” method for startup-parameters......in which case, the instance must be manually started using the normal SQLPLUS method......&, as at June 2009, this only seems to apply to RACxxxThiru’s “CTC” database instances (which will be changed in due course to “SPFILE” method).
If you find that an instance hasn’t auto-started (but does use “SPFILE” method), then use the following command syntax to start it:
srvctl start instance -d DBNAME -i INSNAME
(eg srvctl start instance -d ThiruDB_GEP -i ThiruDBGEP2)
Repeat that SRVCTL command for each instance that hasn’t started, but should be.
2. Once all instances have started, check all alert-logs for any issues via:
more /u01/app/oracle/admin/*/bdump/alert_*.log
Note: page back to the time of instance-startup & work through to double-check.
3. Now ensure the Grid Control agent is running:
ps -ef | grep -i emagent
There should be a process running under “oracle” username running the program “/u01/app/oracle/product/10.2.0/agent10g/bin/emagent”. So, if not then do:
. oraenv
(respond with “agent”)
(nb: sometimes the SID must be entered in uppercase - depending on the entry in "/etc/oratab" file - so, if the utility prompts for the ORACLE_HOME then abort & retry using the opposite case)
/u01/app/oracle/product/10.2.0/agent10g/bin/emctl start agent
4. Now ensure the local listeners are running:
ps -ef | grep -i lsnr
There should be one process running the Oracle-database listener under “oracle” username via the program “/u01/app/oracle/product/10.2.0/db_1/tnslsnr” & (typically) with a listener named in the form “LISTENER_RACNAME_FULLNODENAME”. So, if not then do (& note that the listener names are case-sensitive):
lsnrctl start LISTENER_RACNAME_FULLNODENAME
(eg lsnrctl start LISTENER_RAC2Thiru_P13504Thiru020)
There should be one process running the Oracle-ASM listener under “oracle” username via the program “/u01/app/oracle/product/10.2.0/asm/tnslsnr” & (typically) with a listener named in the form “ASM_LISTENER_FULLNODENAME”. So, if not then do (& note that the listener names are case-sensitive):
lsnrctl start ASM_LISTENER_FULLNODENAME
(eg lsnrctl start ASM_LISTENER_P13504Thiru020)
Finally, check the listener status via:
lsnrctl status ASM_LISTENER_FULLNODENAME
lsnrctl status LISTENER_RACNAME_FULLNODENAME
5. Now check in Grid Control how much space is currently in-use in the cluster's DATADG & FLASHDG disk-groups & ensure it’s sufficient.
6. At this point, we’ve completed basic checks that everything that should be running is indeed running.
Now we need to perform a check that all “services” are running on their correct LPARs (to ensure that the workload is correctly “balanced” across the cluster).
Note: when an instance is taken down, some listener-services specific (as applicable) to it may have been automatically moved across to an alternative LPAR......however, the services will NOT automatically move back to their normal instance - hence, this check/task must be undertaken.
While under ORACLE username, first record the state of things as they stand (& the assumption is that the healthcheck being undertaken is part of a change......hence the file-name suffix):
crs_stat > $HOME/crs_stat.post_chxxxxxx
ps -ef > $HOME/psminusef.post_chxxxxxx
df -g > $HOME/dfminusg.post_chxxxxxx
Now check the state of non-ASM database/instance services:
ps -ef | grep -i pmon
srvctl status service -d DBNAME
(eg srvctl status service -d CCTM)
nb: if nothing is returned, then skip to the next-listed instance-name as this means there are no specific listener-services applicable to the database/instance
srvctl config service -d DBNAME
This command shows where the service should be located and a 1st preference. If possible follow this to relocate the services using the ‘srvctl relocate service’ command detailed below.
Alternatively, if the SRVCTL STATUS output includes a service-name that is clearly particular to this LPAR (by implication of the naming convention used for the listed services), but it’s shown as running on another instance in the cluster, then relocate that service to this LPAR now as follows:
srvctl relocate service -d DBNAME -s SERVICE -i CURRENT -t TARGET
(eg srvctl relocate service -d CCTM -s WAS_CCTM_02 -i CCTM4 -t CCTM2)
That example causes “WAS_CCTM_02” service to move from CCTM4 instance/lpar to CCTM2 instance/lpar......on the basis that the SRVCTL STATUS output showed:
“Service WAS_CCTM_02 is running on instance(s) CCTM4”
Repeat those two SRVCTL commands (as necessary) for each non-ASM instance listed by the “ps -ef” command.
7. Now check to see where "tsmorasched" & "rmarchivelogs" services are currently running:
NB: we normally run these on the cluster node whose instance-names end with “1”.
crs_stat
(tip: they'll typically be listed at the end of the output)
So, if they are normally supposed to be running on this LPAR but they are currently elsewhere, then they must be relocated back as follows (where FULLNODENAME = this LPAR’s full name):
crs_relocate rmarchivelogs -c FULLNODENAME
(eg crs_relocate rmarchivelogs -c P13704Thiru024)
AND
crs_relocate tsmorasched -c FULLNODENAME
When done, double-check via "crs_stat"......& make sure they're back on this LPAR.
8. If the healthcheck at this time is part of a change that has just rebooted (ie both shutdown & restarted) an LPAR......& if during the shutdown phase of the change you also recorded the state of how things
stood (just as you did earlier in this procedure for post-reboot), then action the following final check as an added comfort-factor.
NB: if the above is not applicable, then just use “crs_stat” & ensure all’s ok.
So, compare the contents of......
$HOME/crs_stat.pre_chxxxxxx
&
$HOME/crs_stat.post_chxxxxxx
......& resolve (ie relocate) any remaining service discrepancies accordingly.
9. At this point, this procedure is complete. If applicable, advise whoever required/requested the healthcheck that all’s ok.
Random Posts
Blog Archive
ORACLE DBA
Search This Blog
Blog Archive
-
▼
2011
(101)
-
▼
July
(79)
- oracle RAC Commands
- 10g ORACLE
- Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_orac...
- To monitor job progress[RMAN]
- systemstate and hang analyse
- Thirupal_Boreddy_oracle_DBA: systemstate and hang ...
- UNIX raw filesystem commands for dba
- Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_orac...
- ENQUEUE LOCK:
- Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_orac...
- The Archiver hung alerts
- How to register database with RMAN recovery catalog
- Thirupal_Boreddy_oracle_DBA: Procedure to relocate...
- dbms_metadata
- Thirupal_Boreddy_oracle_DBA: systemstate and hang ...
- set date format
- Thirupal_Boreddy_oracle_DBA: systemstate and hang ...
- Static Parameter
- HSMP cluster commands:
- HSMP cluster commands:
- systemstate and hang analyse
- OMS Commands
- Thirupal_Boreddy_oracle_DBA: OMS Commands
- crs commands
- crs commands
- crs commands
- Process for clearing down archivelogs via Rman
- Thiru: Process for clearing down archivelogs via Rman
- Converting to archive log mode in RAC
- Thiru: Converting to archive log mode in RAC
- Steps for VIP Relocation
- Thiru: Converting to archive log mode in RAC
- Add space in RAC DB TABLESPACES:
- Thiru: Process for clearing down archivelogs via Rman
- Chang Archive log destination
- Thiru: Chang Archive log destination
- Manual Archive Backup
- Thiru: Manual Archive Backup
- Thiru: To monitor job progress[RMAN]
- Thiru: Thirupal_Boreddy_oracle_DBA: ENQUEUE LOCK:
- Thiru: Chang Archive log destination
- RMAN Checks
- Thiru: RMAN Checks
- ASM – basic things to look for
- Thiru: RMAN Checks
- Killing a Hanging CRSD Daemon;
- Thiru: ASM – basic things to look for
- Useful CRS Commands
- Thiru: Useful CRS Commands
- Thiru: Useful CRS Commands
- Useful CRS Commands
- Thiru: Useful CRS Commands
- Operating System Commands
- Thiru: Useful CRS Commands
- Tracing a SQL session
- Check cluster name
- The Archiver hung alerts
- Log file locations in RAC
- Thiru: Tracing a SQL session
- System state dump and Hanganalyze
- Thiru: The Archiver hung alerts
- RAC - Healthchecking the Startup of Oracle for a g...
- Thiru: Log file locations in RAC
- RAC - Stopping everything Oracle-related for a giv...
- Thiru: Procedure to relocate services in RAC nodes
- Enabling Archive Logs in a RAC Environment
- Adding a new vote disk
- Thiru: Enabling Archive Logs in a RAC Environment
- Export of table Partition
- Flashback setup in RAC
- Voting Disk
- Post database creation CRS Health check
- RAC Troubleshooting
- Change the parameters for RAC database
- Compressed Export and Import
- Adding the space to tablespace
- Change db_recovery_file_dest_size
- Restarting SSH broker in the event of a hang
- Thiru: Thirupal_Boreddy_oracle_DBA: oracle RAC Com...
-
▼
July
(79)
Thirupal Boreddy. Powered by Blogger.
My Blog List
Followers
Lorem 1
Technology
Circle Gallery
‹
›
Shooting
Racing
News
Lorem 4
Home
»
RAC - Healthchecking the Startup of Oracle for a given RAC LPAR
» RAC - Healthchecking the Startup of Oracle for a given RAC LPAR
Tagged with: RAC - Healthchecking the Startup of Oracle for a given RAC LPAR
About Oracle DBA World
WePress Theme is officially developed by Templatezy Team. We published High quality Blogger Templates with Awesome Design for blogspot lovers.The very first Blogger Templates Company where you will find Responsive Design Templates.
Subscribe to:
Post Comments (Atom)
QUERY FOR CPU USAGE
select a.target_name as HOST, to_char(a.rollup_timestamp,' dd-Mon-yy::hh24:mi') as "DAY-TIME", sum(c.cpu_count) as ...


No comments