Select Menu

Ads

Random Posts

Powered By Blogger
Powered By Blogger

Blog Archive

Search This Blog

Blog Archive

Thirupal Boreddy. Powered by Blogger.

My Blog List

Followers

Lorem 1

Technology

Circle Gallery

Shooting

Racing

News

Lorem 4

Thiru: Thirupal_Boreddy_oracle_DBA: oracle RAC Commands: "Thirupal_Boreddy_oracle_DBA: oracle RAC Commands : 'There are certain checks that can be quickly performed to check the health of CRS crsctl..."
Restarting SSH broker in the event of a hang.
Pre requirements:
This should only be attempted in the event of outbound connections hanging.

• If you are having problems performing a direct SSH connection “into” a system, this is not the correct procedure.

• You should attempt to make an SSH connection to a few other systems first, to ensure they all hang, before running with this. If ANY of the connections work, this is not a broker hang problem, but a problem on the specific system you need to connect to.


Steps to restart the broker:
• Ensure you are logged onto the system as the affected user

• "ps -aef | grep -i broker" take note of the PID on the ssh broker process

• Try command “ssh-broker-g3 –exit” to stop the process

• "ps -aef | grep -i broker" to check that it is now gone

Note: if the “ssh-broker-g3 –exit” command did not work, try it a 2nd, and maybe even a 3rd time before going to the kill option.

• If the broker process will not go, a “kill -9 PID” will need to be used

Note: A new broker will start up the first time the user attempts an outbound SSH connection, so the connection can be tested at this point.
SQL> select name from v$database;


NAME

---------

XXXXXXX


SQL> show parameter db_recovery



NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

db_recovery_file_dest string +FLASHDG

db_recovery_file_dest_size big integer 13000M


SQL> alter system set db_recovery_file_dest_size=20G scope=both;


System altered.


SQL> show parameter db_recovery


NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

db_recovery_file_dest string +FLASHDG

db_recovery_file_dest_size big integer 20G
Adding the space to tablespace:
===============================

Step 1: Check the space utilization

select a.tbl "Name",a.tsz "Total Size",b.fsz "Free Space",
round((1-(b.fsz/a.tsz))*100) "Pct Used",round((b.fsz/a.tsz)*100) "Pct Free" from
(select tablespace_name tbl,sum(bytes)/1024/1024 TSZ from dba_data_files
where tablespace_name ='MTHLY_DM_DATA' group by tablespace_name) a,
(select tablespace_name tblsp,sum(bytes)/1024/1024 FSZ from dba_free_space
where tablespace_name ='MTHLY_DM_DATA' group by tablespace_name) b
Where a.tbl=b.tblsp;

Step 2: Check the size of the tablespace;


col file_name for a55
select file_name,BYTES/1024/1024 from dba_data_files where tablespace_name ='MTHLY_DM_DATA' order by file_name;


Step 3: Add the space to tablespace:

1.

alter tablespace MTHLY_DM_DATA add datafile '/db031/oradata/BAPROD/mthly_dm_data_11.dbf' size 5120M;

2. ALTER DATABASE DATAFILE '/u02/oradata/ENGAGPRD/eng_data1.dbf' RESIZE 4608M;


3. ASM

ALTER TABLESPACE "IONBOARD" ADD DATAFILE '+DATADG' SIZE 31457280K REUSE AUTOEXTEND ON NEXT 1024M MAXSIZE 32767M;
Export:

Description:

i. datetime=` date +"%y%m%d_M%S"`


ii. mknod /home/oracle/aakash/devrelease/Exp_Pipe_Test/exp_pipe p


iii. gzip -cNf /home/oracle/aakash/devrelease/Exp_Pipe_Test/bstbuy_lkup_$datetime.dmp.gz &


iv. exp username/password file=/home/oracle/aakash/devrelease/Exp_Pipe_Test/exp_pipe compress=y rows=y tables=CMGT_TASK buffer=8192000 log=/home/oracle/aakash/devrelease/Exp_Pipe_Test/bstbuy_lkup_$datetime.dmp.log


v. rm -f /home/oracle/aakash/devrelease/Exp_Pipe_Test/exp_pipe


Import:


Description:



i. datetime=` date +"%y%m%d_M%S"`


ii. mknod /home/oracle/aakash/devrelease/Exp_Pipe_Test/imp_pipe p


iii. gunzip -c bstbuy_lkup_090102_100114.dmp.gz >/home/oracle/aakash/devrelease/Exp_Pipe_Test/imp_pipe &


iv. imp username/password file=/home/oracle/aakash/devrelease/Exp_Pipe_Test/imp_pipe full=y ignore=y buffer=1024000 commit=y log=/home/oracle/aakash/devrelease/Exp_Pipe_Test/imp_bstbuy_lkup_$datetime.log


v. rm -f /home/oracle/aakash/devrelease/Exp_Pipe_Test/imp_pipe
Change the parameters for RAC database:

1) Login to prod1 server as ORACLE user.

2) Set the . oraenv to orcl1

3) Login to the database as sysdba

3.1) select STATUS,INSTANCE_NUMBER,INSTANCE_NAME from gv$instance;

3.2) Show parameter sga; (this would show the current value for sga_max_size and sga_target to be 500MB)

3.3) alter system set sga_max_size=1g scope=spfile SID='*';

3.4) alter system set sga_target=1g scope=spfile SID='*';

3.5) exit

4) srvctl stop instance -d ORCL -i ORCL1

5) srvctl start instance -d ORCL -i ORCL1

6) Check the alertlog and the listener logs and confirm all ok.

*********************

7) ssh PROD2 server.

8) set the environment .oraenv to ORCL2

9) srvctl stop instance -d ORCL -i ORCL2

10) srvctl start instance -d ORCL -i ORCL2

11) Check the alertlog and the listener logs and confirm all ok.

***********************

12) ssh PROD3 server.

13) set the environment .oraenv to ORCL3

14) srvctl stop instance -d ORCL -i ORCL3

15) srvctl start instance -d ORCL -i ORCL3

16) Check the alertlog and the listener logs and confirm all ok.

************************

17) ssh PROD4 server

18) set the environment .oraenv to ORCL4

19) srvctl stop instance -d ORCL -i ORCL4

20) srvctl start instance -d ORCL -i ORCL4

21) Check the alertlog and the listener logs and confirm all ok
RAC Troubleshooting

This is the one section what will be updated frequently as my experience with RAC grows, as RAC has been around for a while most problems can be resolve with a simple google lookup, but a basic understanding on where to look for the problem is required. In this section I will point you where to look for problems, every instance in the cluster has its own alert logs, which is where you would start to look. Alert logs contain startup and shutdown information, nodes joining and leaving the cluster, etc.

Here is my complete alert log file of my two node RAC starting up.

The cluster itself has a number of log files that can be examined to gain any insight of occurring problems, the table below describes the information that you may need of the CRS components

1. $ORA_CRS_HOME/crs/log : contains trace files for the CRS resources

2. $ORA_CRS_HOME/crs/init : contains trace files for the CRS daemon during startup, a good place to start

3. $ORA_CRS_HOME/css/log : contains cluster reconfigurations, missed check-ins, connects and disconnects from the client CSS listener. Look here to obtain when reboots occur

4. $ORA_CRS_HOME/css/init : contains core dumps from the cluster synchronization service daemon (OCSd)


5. $ORA_CRS_HOME/evm/log : log files for the event volume manager and eventlogger daemon

6. $ORA_CRS_HOME/evm/init : pid and lock files for EVM

7. $ORA_CRS_HOME/srvm/log : log files for Oracle Cluster Registry (OCR)


8. $ORA_CRS_HOME/log : log files for Oracle clusterware which contains diagnostic messages at the Oracle cluster level

As in a normal Oracle single instance environment, a RAC environment contains the standard RDBMS log files, these files are located by the parameter background_dest_dump. The most important of these are

9. $ORACLE_BASE/admin/udump : contains any trace file generated by a user process

10. $ORACLE_BASE/admin/cdump : contains core files that are generated due to a core dump in a user process

Now lets look at a two node startup and the sequence of events

First you must check that the RAC environment is using the connect interconnect, this can be done by either of the following

Logfile ## The location of my alert log, yours may be different

/u01/app/oracle/admin/racdb/bdump/alert_racdb1.log

ifcfg command oifcfg getif

table check select inst_id, pub_ksxpia, picked_ksxpia, ip_ksxpia from x$ksxpia;

Oradebug SQL> oradebug setmypid

SQL> oradebug ipc



Note: check the trace file which can be located by the parameter user_dump_dest

system parameter cluster_interconnects

Note: used to specify which address to use
Post database creation CRS Health check:

2.1 Check that the crs daemons are running on each node, if not then start them.
ps –ef|grep d.bin should return

evmd.bin

ocssd.bin

crsd.bin

oprocd.bin

To start them login as root and source the .profile in /home/oracle and then run the following command

crsctl start crs

2.2 Run the following command as the oracle user on one of the nodes in the cluster and check that everything is online.

crs_stat –t –v


2.3 Run the following command as the oracle user on all nodes in the cluster

olsnodes

This should return all of the nodes of the cluster and that css is up and running and css on each node can talk to the css of the other nodes in the cluster.

2.4 Run the following commands as the root user on each node

crsctl check crs

crsctl check cssd

crsctl check crsd

crsctl check evmd

All of these commands should return that the daemon appears healthy

2.5 As the oracle user check the OCR integrity

Ocrcheck

2.6 Backup the OCR by running the following command as the root user

ocrconfig –export $CRS_HOME/manual_backups/ocr_export.dmp –s online


2.7 As the root user check the automatic backups of the OCR

ocrconfig –showbackup
Voting Disk


The voting disk as I mentioned in the architecture is used to resolve membership issues in the event of a partitioned cluster, the voting disk protects data integrity.


querying : crsctl query css votedisk

adding : crsctl add css votedisk

deleting : crsctl delete css votedisk
Flashback setup in RAC

## Make sure that the database is running in archive log mode

SQL> archive log list

## Setup the flashback

SQL> alter system set cluster_database=false scope=spfile sid='prod1';

SQL> alter system set DB_RECOVERY_FILE_DEST_SIZE=200M scope=spfile;

SQL> alter system set DB_RECOVERY_FILE_DEST='/ocfs2/flashback' scope=spfile;

srvctl stop database -p prod1

SQL> startup mount

SQL> alter database flashback on;

SQL> shutdown;

srvctl start database -p prod1
Export of table Partition:

exp file=exp_f_usd_022007.dmp

log=exp_f_usd_022007.log

tables=

IPDS.F_USD:F_USD001366,

IPDS.F_USD:F_USD001367,

IPDS.F_USD:F_USD001368,

IPDS.F_USD:F_USD001369,

IPDS.F_USD:F_USD001370,

IPDS.F_USD:F_USD001371

buffer=40480000

compress=n



Truncate table Partition:

ALTER TABLE IPDS.F_USD TRUNCATE PARTITION F_USD001366;

ALTER TABLE IPDS.F_USD TRUNCATE PARTITION F_USD001367;

ALTER TABLE IPDS.F_USD TRUNCATE PARTITION F_USD001368;

ALTER TABLE IPDS.F_USD TRUNCATE PARTITION F_USD001369;

ALTER TABLE IPDS.F_USD TRUNCATE PARTITION F_USD001370;

ALTER TABLE IPDS.F_USD TRUNCATE PARTITION F_USD001371;



Import :



imp file=exp_f_usd_022007.dmp log=imp_f_usd_022007.log fromuser=IPDS touser=IPDS

buffer=40480000 ignore=y rows=y
Thiru: Enabling Archive Logs in a RAC Environment: "Enabling Archive Logs in a RAC Environment - (Oracle10g). ________________________________________ Whether a single instance or clustered da..."
Adding a new vote disk

As root

cd /u01/crs/oracle/product/10.2.0/crs/bin

1. see where your voting disks currently are (run on one node)

./crsctl query css votedisk

2. shut down the clusterware on every node

./crsctl stop crs

3. add the new voting disk (run on one node)

./crsctl add css votedisk /u01/rac4prd_nfs_votedisk_p3/nfs_votedisk_3 -force

4. delete the old on you don’t want (run on one node)

./crsctl delete css votedisk /dev/p3_votedisk_3 -force

5. check what’s there (run on one node)

./crsctl query css votedisk

5.1 Change permissions on new votedisk as per Unix standards


6. bring everything back up on every node

./crsctl start crs



Trace srvctl command

To trace the srvctl command and get the commands that are being run in the background, and any errors then;

$> export SRVM_TRACE=TRUE

$> srvctl start database -d DB
Enabling Archive Logs in a RAC Environment - (Oracle10g). ________________________________________
Whether a single instance or clustered database, Oracle tracks (logs) all changes to database blocks in online redolog files. In an Oracle RAC environment, each instance will have its own set of online redolog files known as a thread. Each Oracle instance will use its set (group) of online redologs in a circular manner. Once an online redolog fills, Oracle moves to the next one. If the database is in "Archive Log Mode", Oracle will make a copy of the online redo log before it gets reused. A thread must contain at least two online redologs (or online redolog groups). The same holds true for a single instance configuration. The single instance must contain at least two online redologs (or online redolog groups).
The size of an online redolog file is completely independent of another intances' redolog size. Although in most configurations the size is the same, it may be different depending on the workload and backup / recovery considerations for each node. It is also worth mentioning that each instance has exclusive write access to its own online redolog files. In a correctly configured RAC environment, however, each instance can read another instance's current online redolog file to perform instance recovery if that instance was terminated abnormally. It is therefore a requirement that online redo logs be located on a shared storage device (just like the database files).
As already mentioned, Oracle writes to its online redolog files in a circular manner. When the current online redolog fills, Oracle will switch to the next one. To facilitate media recovery, Oracle allows the DBA to put the database into "Archive Log Mode" which makes a copy of the online redolog after it fills (and before it gets reused). This is a process known as archiving.
The Database Creation Assistant (DBCA) allows users to configure a new database to be in archive log mode, however most DBA's opt to bypass this option. In cases like this where the database is in no archive log mode, it is a simple task to put the database into archive log mode. Note however that this will require a short database outage. From one of the nodes in the Oracle RAC 10g configuration, use the following tasks to put a RAC enabled database into archive log mode. For the purpose of this article, I will use the node linux1 which runs the orcl1 instance:

1. Login to one of the nodes (i.e. linux1) and disable the cluster instance parameter by setting cluster_database to FALSE from the current instance:

2. $ sqlplus "/ as sysdba"

SQL> alter system set cluster_database=false scope=spfile sid='orcl1';

3. Shutdown all instances accessing the clustered database:

$ srvctl stop database -d orcl

4. Using the local instance, MOUNT the database:

5. $ sqlplus "/ as sysdba"

SQL> startup mount

6. Enable archiving:

SQL> alter database archivelog;

7. Re-enable support for clustering by modifying the instance parameter cluster_database to TRUE from the current instance:

SQL> alter system set cluster_database=true scope=spfile sid='orcl1';

8. Shutdown the local instance:

SQL> shutdown immediate

9. Bring all instance back up using srvctl:

$ srvctl start database -d orcl

10. (Optional) Bring any services (i.e. TAF) back up using srvctl:

$ srvctl start service -d orcl

11. Login to the local instance and verify Archive Log Mode is enabled:

12. $ sqlplus "/ as sysdba"

13. SQL> archive log list

14. Database log mode Archive Mode

15. Automatic archival Enabled

16. Archive destination USE_DB_RECOVERY_FILE_DEST

17. Oldest online log sequence 83

18. Next log sequence to archive 84

Current log sequence 84

After enabling Archive Log Mode, each instance in the RAC configuration can automatically archive redologs!
Thiru: Procedure to relocate services in RAC nodes: "Procedure to relocate services in RAC nodes srvctl relocate service -d RTTAPP4P -s WAS_RTTAPP4P_DCF_01 -i RTTAPP4P2 -t RTTAPP4P1 Check th..."
RAC - Stopping everything Oracle-related for a given RAC LPAR
The objective/scope here is to cleanly close everything that's running under "oracle" username on a given RAC LPAR.

This order of tasks is based on the RAC Administration manual, & tallies with several procedures/experiences found documented on the internet & takes into account our own configurations/running-services.

This procedure has been tested & used in a 10.2.0.3.0 Thiruuction-environment.


0. When it's confirmed that preparatory tasks (as applicable) have all completed & that the LPAR's ready for you to take everything (Oracle-related) down......contact Ops & warn them that they'll likely see alerts relating to this LPAR (including some possibly coming from - but seemingly (to Ops) relating to - the Grid Control LPARs, too).


1. Login under ORACLE username & record the state of things as they stand (for comparison later):

crs_stat -t > $HOME/crs_stat.pre_chxxxxxx


ps -ef > $HOME/psminusef.pre_chxxxxxx


df -g > $HOME/dfminusg.pre_chxxxxxx


Finally, check in Grid Control how much space is currently in-use in the cluster's DATADG & FLASHDG disk-groups & note for comparison later.



1a. Now check to see where "tsmorasched" & "rmarchivelogs" services are currently running:


crs_stat

(tip: they'll typically be listed at the end of the output)



So, if they are currently running on this LPAR, then they must be moved to an alternative LPAR in the cluster - as follows:



crs_relocate rmarchivelogs -c FULLNODENAME

(eg crs_relocate rmarchivelogs -c P13704Thirupal024)




AND



crs_relocate tsmorasched -c FULLNODENAME



When done, double-check via "crs_stat"......& make sure they're now running on the chosen alternative LPAR.





2. Now determine which instances need to be shutdown for this LPAR:



ps -ef | grep -i pmon

(ie show which instances are currently running)





3. For each just-listed non-ASM instance-name, do:

**** WARNING **** Be sure to specify the correct instance NUMBER in the following commands, because it WILL work for a remote instance if you accidentally specify a valid number!



srvctl stop instance -d DBNAME -i INSNAME -o immediate

(eg srvctl stop instance -d SUJITH_GEP -i SUJITHGEP2 -o immediate)



Repeat that SRVCTL command for each just-listed non-ASM instance (nb: changing "immediate" to "abort" only as a last resort).



When done, double-check via "ps -ef | grep -i pmon"......& make sure only the +ASM instance remains.



Note: in each case (& as applicable), any listener-services for this LPAR's database-instance will automatically move across to an alternative LPAR in the cluster during the "srvctl stop instance......" command.



4. Now take down the ASM instance:

**** WARNING **** Be sure to specify the correct instance NUMBER in the following command, because it WILL work for a remote ASM instance if you accidentally specify a valid number!


srvctl stop asm -n FULLNODENAME -i ASMINSTANCENAME

(eg srvctl stop asm -n P13504Thirupal020 -i +ASM2)


When done, double-check via "ps -ef | grep -i pmon"......& make sure nothing remains.



5. Now stop the local listeners:



ps -ef | grep -i lsnr


lsnrctl stop LISTENERNAME

(eg lsnrctl stop LISTENER_RAC2Thirupal_P13504Thirupal020)



Repeat that LSNRCTL command for each just-listed listener, specifying the full-listed-name each time.



When done, double-check via "ps -ef | grep -i lsnr"......& make sure nothing remains.





6. Now stop the Grid Control agent:



ps -ef | grep -i emagent



/u01/app/oracle/Thiruuct/10.2.0/agent10g/bin/emctl stop agent

(nb: might say it's already stopped/down......kill the process only as a last resort)



When done, double-check via "ps -ef | grep -i emagent"......& make sure nothing remains.





7. Logout from ORACLE username & login under ROOT username, then stop Clusterware as follows:



. oraenv

(respond with “crs”)

(nb: sometimes the SID must be entered in uppercase - depending on the entry in "/etc/oratab" file - so, if the utility prompts for the ORACLE_HOME then abort & retry using the opposite case)



crsctl stop crs



Now wait 02 mins.





8. Now check for any remaining CRS-related processes running under "oracle" username:


ps -ef | grep -i oracle | grep -i crs


If any remain that are running a program in the "/u01/crs/oracle/Thiruuct......" directory path, then retry the "crsctl stop crs" command at least once......killing such remaining (CRS-related) processes only as a last resort.



9. Now check for anything else still running under "oracle" username:



ps -ef | grep -i oracle



If anything remains & you know which utility cleanly closes the process, then use it.





Otherwise, kill the process.





10. Now there should be no remaining processes running under "oracle" username. At this point, this procedure is complete. If applicable, advise whoever required/requested the close-down that the
LPAR's ready for them.


NOTE:


If the LPAR is to be subsequently rebooted, then everything SHOULD return automatically by virtue of how the LPAR's startup procedures are configured......& that includes:


Clusterware (ie CRS) & general services

ASM instance

Database instances (& associated listener-services, as applicable)

Listeners

Grid Control agent



Bear in mind, however, that any listener-services for a given database-instance that automatically moved across to an alternative LPAR in the cluster during "srvctl stop instance......" commands you issued
earlier will NOT automatically move back to their normal LPAR - & neither will the "tsmorasched" or "rmarchivelogs" services.
Thiru: Log file locations in RAC: "Log file locations in RAC The locations to view CRS logs are the following directories: - 1. CRS log files: $CRS_HOME/crs/init $CRS_HOME..."
RAC - Healthchecking the Startup of Oracle for a given RAC LPAR

The objective/scope here is to healthcheck (& complete as necessary) the startup of all normal running Oracle-related aspects for a given production RAC LPAR.

This includes the minimum/typical processes post-LPAR-startup that should be running under "oracle" username to support each running local database-instance.

This order of tasks is based on the RAC Administration manual, & tallies with several procedures/experiences found documented on the internet & takes into account our own configurations/running-services.

This procedure has been tested & used in a 10.2.0.3.0 production-environment.


0. The assumption is made here that a production LPAR boot or reboot has just taken place......in which case (by design of our production RAC environments) everything should startup automatically - including:

Clusterware (ie CRS) & general services

ASM instance

Database instances (& associated services, as applicable)

Listeners

Grid Control agent


1. So, when tasked with healthchecking the state of things after the boot/reboot, login under ORACLE username & do the following to check that CRS is up & running & ALL instances have started up......

crsctl check crs

(wait for confirmation that all appears healthy)

more /etc/oratab

ps -ef | grep -i pmon

......& the resulting list of processes should eventually comprise all instances listed in “/etc/oratab” (including the ASM instance).
Note1: the RAC auto-start is NOT dependent on “/etc/oratab” having “Y” value set for any instance to auto-start......rather, the RAC Repository dictates what auto-starts.
Note2: this is very rare, but there may be an instance that fails to auto-start because it is not using the “SPFILE” method for startup-parameters......in which case, the instance must be manually started using the normal SQLPLUS method......&, as at June 2009, this only seems to apply to RACxxxThiru’s “CTC” database instances (which will be changed in due course to “SPFILE” method).

If you find that an instance hasn’t auto-started (but does use “SPFILE” method), then use the following command syntax to start it:

srvctl start instance -d DBNAME -i INSNAME

(eg srvctl start instance -d ThiruDB_GEP -i ThiruDBGEP2)


Repeat that SRVCTL command for each instance that hasn’t started, but should be.


2. Once all instances have started, check all alert-logs for any issues via:

more /u01/app/oracle/admin/*/bdump/alert_*.log

Note: page back to the time of instance-startup & work through to double-check.


3. Now ensure the Grid Control agent is running:


ps -ef | grep -i emagent


There should be a process running under “oracle” username running the program “/u01/app/oracle/product/10.2.0/agent10g/bin/emagent”. So, if not then do:

. oraenv

(respond with “agent”)
(nb: sometimes the SID must be entered in uppercase - depending on the entry in "/etc/oratab" file - so, if the utility prompts for the ORACLE_HOME then abort & retry using the opposite case)

/u01/app/oracle/product/10.2.0/agent10g/bin/emctl start agent


4. Now ensure the local listeners are running:


ps -ef | grep -i lsnr


There should be one process running the Oracle-database listener under “oracle” username via the program “/u01/app/oracle/product/10.2.0/db_1/tnslsnr” & (typically) with a listener named in the form “LISTENER_RACNAME_FULLNODENAME”. So, if not then do (& note that the listener names are case-sensitive):

lsnrctl start LISTENER_RACNAME_FULLNODENAME

(eg lsnrctl start LISTENER_RAC2Thiru_P13504Thiru020)

There should be one process running the Oracle-ASM listener under “oracle” username via the program “/u01/app/oracle/product/10.2.0/asm/tnslsnr” & (typically) with a listener named in the form “ASM_LISTENER_FULLNODENAME”. So, if not then do (& note that the listener names are case-sensitive):

lsnrctl start ASM_LISTENER_FULLNODENAME

(eg lsnrctl start ASM_LISTENER_P13504Thiru020)

Finally, check the listener status via:

lsnrctl status ASM_LISTENER_FULLNODENAME

lsnrctl status LISTENER_RACNAME_FULLNODENAME


5. Now check in Grid Control how much space is currently in-use in the cluster's DATADG & FLASHDG disk-groups & ensure it’s sufficient.


6. At this point, we’ve completed basic checks that everything that should be running is indeed running.

Now we need to perform a check that all “services” are running on their correct LPARs (to ensure that the workload is correctly “balanced” across the cluster).
Note: when an instance is taken down, some listener-services specific (as applicable) to it may have been automatically moved across to an alternative LPAR......however, the services will NOT automatically move back to their normal instance - hence, this check/task must be undertaken.

While under ORACLE username, first record the state of things as they stand (& the assumption is that the healthcheck being undertaken is part of a change......hence the file-name suffix):

crs_stat > $HOME/crs_stat.post_chxxxxxx

ps -ef > $HOME/psminusef.post_chxxxxxx

df -g > $HOME/dfminusg.post_chxxxxxx

Now check the state of non-ASM database/instance services:

ps -ef | grep -i pmon

srvctl status service -d DBNAME

(eg srvctl status service -d CCTM)

nb: if nothing is returned, then skip to the next-listed instance-name as this means there are no specific listener-services applicable to the database/instance

srvctl config service -d DBNAME

This command shows where the service should be located and a 1st preference. If possible follow this to relocate the services using the ‘srvctl relocate service’ command detailed below.

Alternatively, if the SRVCTL STATUS output includes a service-name that is clearly particular to this LPAR (by implication of the naming convention used for the listed services), but it’s shown as running on another instance in the cluster, then relocate that service to this LPAR now as follows:

srvctl relocate service -d DBNAME -s SERVICE -i CURRENT -t TARGET

(eg srvctl relocate service -d CCTM -s WAS_CCTM_02 -i CCTM4 -t CCTM2)

That example causes “WAS_CCTM_02” service to move from CCTM4 instance/lpar to CCTM2 instance/lpar......on the basis that the SRVCTL STATUS output showed:

“Service WAS_CCTM_02 is running on instance(s) CCTM4”

Repeat those two SRVCTL commands (as necessary) for each non-ASM instance listed by the “ps -ef” command.


7. Now check to see where "tsmorasched" & "rmarchivelogs" services are currently running:

NB: we normally run these on the cluster node whose instance-names end with “1”.

crs_stat

(tip: they'll typically be listed at the end of the output)

So, if they are normally supposed to be running on this LPAR but they are currently elsewhere, then they must be relocated back as follows (where FULLNODENAME = this LPAR’s full name):

crs_relocate rmarchivelogs -c FULLNODENAME

(eg crs_relocate rmarchivelogs -c P13704Thiru024)

AND

crs_relocate tsmorasched -c FULLNODENAME

When done, double-check via "crs_stat"......& make sure they're back on this LPAR.


8. If the healthcheck at this time is part of a change that has just rebooted (ie both shutdown & restarted) an LPAR......& if during the shutdown phase of the change you also recorded the state of how things
stood (just as you did earlier in this procedure for post-reboot), then action the following final check as an added comfort-factor.
NB: if the above is not applicable, then just use “crs_stat” & ensure all’s ok.

So, compare the contents of......

$HOME/crs_stat.pre_chxxxxxx

&

$HOME/crs_stat.post_chxxxxxx


......& resolve (ie relocate) any remaining service discrepancies accordingly.


9. At this point, this procedure is complete. If applicable, advise whoever required/requested the healthcheck that all’s ok.
Thiru: The Archiver hung alerts: "The Archiver hung alerts: 1. Check space in DB_RECOVERY_FILE_DEST_SIZE 2. If there is no space left, increase the size. ALTER SYSTEM set ..."
1. System state dump

sqlplus / as sysdba

SQL> oradebug setmypid

SQL> oradebug unlimit

SQL> oradebug tracefile_name

SQL> oradebug -g all dump systemstate 267



SQL> oradebug -g all dump systemstate 267



SQL> oradebug -g all dump systemstate 267

SQL> exit

2.Hanganalyze


sqlplus / as sysdba

SQL> oradebug setmypid

SQL> oradebug setinst all

SQL> oradebug unlimit

SQL> oradebug tracefile_name

SQL> oradebug -g def hanganalyze 3



SQL> oradebug -g def hanganalyze 3



SQL> oradebug -g def hanganalyze 3

SQL> exit
Thiru: Tracing a SQL session: "Tracing a sql session is an excellent way to determine if a user is causing particular problems on a database/instance: - the following comm..."
Log file locations in RAC

The locations to view CRS logs are the following directories: -

1. CRS log files:

$CRS_HOME/crs/init

$CRS_HOME/crs/node name.log

2. OCR log files:

$CRS_HOME/srvm/log

3. CSS log files:

$CRS_HOME/log/ocssdnumber.log

$CRS_HOME/css/init/node_name.log

4. Event Manager log files (EVM):

$CRS_HOME/evm/log/evmdaemon.log

$CRS_HOME/evm/init/node_name.log

5. Oracle High Availability log files:

$ORACLE_BASE/DB_NAME/admin/hdump

$ORACLE_HOME/racg/log
The Archiver hung alerts:

1. Check space in DB_RECOVERY_FILE_DEST_SIZE

2. If there is no space left, increase the size.

ALTER SYSTEM set DB_RECOVERY_FILE_DEST_SIZE = 15G;


3. Coonect catalog database

connect target /

rman/rm2n10g@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxxx.xxxxx.co.uk)(PORT=1523)))(CONNECT_DATA=(SERVICE_NAME=xxxxx.lloydstsb.co.uk)))

4. Take the backup of archive logs

RMAN>

run {
allocate channel t1 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/u01/app/oracle/admin/tsm_rman/xxxxxx_tdpo.opt)';
backup archivelog all delete input;
}
Check cluster name

smitty hacmp

lssrc -ls clstrmgrES

/usr/es/sbin/cluster/clstat -a
Tracing a sql session is an excellent way to determine if a user is causing particular problems on a database/instance: - the following commands can be used to determine what a use is doing in the database: -

SQL_TRACE = TRUE/FALSE -- deprecated, DBMS_MONITOR & DBMS_SESSION should be used instead.

Speaking in terms of overhead, SQL_TRACE = TRUE is one of the most expensive tracing tools.

DBMS_MONITOR:

A. Start tracing an Oracle session: exec dbms_monitor.session_trace_enable(SID) -- you get the SID from gv$session
Stop tracing an Oracle session: exec dbms_monitor.session_trace_disable(SID) -- you get the SID from gv$session

The overhead of the tracing above is average (10% overhead). It is the reason why, empirically, the tracing below is advisable:

B. Start tracing an Oracle session:

exec dbms_monitor.session_trace_enable(session_id=>125, serial_num=>1375, waits=>true, binds=>false);

Stop tracing an Oracle session:

exec dbms_monitor.session_trace_enable(session_id=>125, serial_num=1375);

The overhead for this is approx. 3% - 4%. This works fine especially when the there is a server – client application., when the application user is creating its own Oracle session (SID + SERIAL#).

C. An alternative way to achieve this is to use dbms_support package:

1. run dbmssupp.sql as sysdba (the script resides in $ORACLE_HOME/rdbms/admin)

2. exec dbms_support.start_trace_in_session (sid=>123, serial#=>1290, waits=>true, binds=>false); --start tracing

3. exec dbms_support.stop_trace_in_session (sid=>123, serial#=>1290); -- stop tracing

D. In case of a middle tier env, then the following procedure should be used (the developer of the application should have created the module with dbms_application_info:

Exec dbms_monitor.serv_mod_act_trace_enable (service_name=>’..’, module_name=>’..’);

--the parameters are taken from gv$session

The trace file will be located by default in udump directory.

Tkprof tracefile outputfile waits=yes sort= prsela exeela fchela will generate a human readable trace file.
Thiru: Useful CRS Commands: "Command Description 1. Srvctl stop database –d Stops a database across all nodes 2. Srvctl start database –d ..."
Operating System Commands:

This section will look at some commands that the dba can run to quickly check the status of the RAC cluster: -

System logs:

The following commands will show system logs that can help in finding the root cause of faults and also current system states or errors. Sample output shows the errpt from a system.

p13404prDXXXX-> errpt -a | more

Error Logs:

Viewing the error logs located under /usr/local/logs is also an excellent start to view if the state of a system if the Oracle side of things is experiencing problems

As shown from the above output for searching errorlogs this examples shows that machine p13404prd019 could not talk to p13604prd023.

{p13404XXXXX}/usr/local/logs$ egrep "error" /usr/local/logs/errorlog*


More error log interrogation :


Running and egrep command on the daemon logs will also show if any errors can lead towards possible failure from the example below you can see errors on the tcp_connect_failed this confirms that from the output above a node cannot be communicated via tcp between node 19 and node 23 – this would tend towards a network or possible network card errors…

{p13404XXXXX}/usr/local/logs$ egrep "Broker_tcp_connect_failed"
Thiru: Useful CRS Commands: "Command Description 1. Srvctl stop database –d Stops a database across all nodes 2. Srvctl start database –d ..."
Command Description

1. Srvctl stop database –d Stops a database across all nodes

2. Srvctl start database –d Starts a database across all nodes

3. Srvctl stop instance –d -I Stops a particular instance on a node

4. Srvctl start instance –d -I Starts a particluar instance on a node

5. Srvctl start service –d Starts all services assoicated with a database across all nodes

6. Srvctl stop service –d Stops all services associated with a database across all nodes

7. Srvctl start service –d -s Starts a particular service on a database

8. Srvctl stop service –d -s Stops a particular service on a database

9. Srvctl stop asm –n Stops an asm instance on a particular node

10.Srvctl start asm –n Starts an asm instance on a particular node

11.Srvctl stop nodeapps –n Stops vip, listener, ons and GSD daemons on a particular node

12.Srvctl start nodeapps –n Starts vip, listener, ons, GSD daemons on a particular node

13.Srvctl stats Various parameters to decide the status of a service, instance, database

14.Srvctl modify Various commands used to modify the status of OCR – used with Oracle Support

15.Crsctl stop crs Stops all crs, resources etc on a particular node

16.Crsctl start crs Starts crs and all resources on a node

17.Crs_stop –all Stops all CRS related resources and daemons

18.Crs_stop Stops the particular resource – used in conjunction with crs_stat – a resource will be called ora..cs for example

19.Crs_start –all Starts all things rac and crs on a node

20.Crs_start Starts a particluar resource on a node
Thiru: Useful CRS Commands: "This section provides useful commands for stopping and starting CRS Stopping & Starting CRS (CRSCTL/CRS_START/CRS_STOP & SRVCTL) Command ..."
Thiru: Useful CRS Commands: "This section provides useful commands for stopping and starting CRS Stopping & Starting CRS (CRSCTL/CRS_START/CRS_STOP & SRVCTL) Command ..."
This section provides useful commands for stopping and starting CRS

Stopping & Starting CRS (CRSCTL/CRS_START/CRS_STOP & SRVCTL)

Command Description
1. Srvctl stop database –d Stops a database across all nodes2. Srvctl start database –d Starts a database across all nodes3. Srvctl stop instance –d -I Stops a particular instance on a node4. Srvctl start instance –d -I Starts a particluar instance on a node5. Srvctl start service –d Starts all services assoicated with a database across all nodes6. Srvctl stop service –d Stops all services associated with a database across all nodes7. Srvctl start service –d -s Starts a particular service on a database8. Srvctl stop service –d -s Stops a particular service on a database9. Srvctl stop asm –n Stops an asm instance on a particular node10.Srvctl start asm –n Starts an asm instance on a particular node11.Srvctl stop nodeapps –n Stops vip, listener, ons and GSD daemons on a particular node12.Srvctl start nodeapps –n Starts vip, listener, ons, GSD daemons on a particular node 13.Srvctl stats Various parameters to decide the status of a service, instance, database14.Srvctl modify Various commands used to modify the status of OCR – used with Oracle Support15.Crsctl stop crs Stops all crs, resources etc on a particular node16.Crsctl start crs Starts crs and all resources on a node17.Crs_stop –all Stops all CRS related resources and daemons18.Crs_stop Stops the particular resource – used in conjunction with crs_stat – a resource will be called ora..cs for example 19.Crs_start –all Starts all things rac and crs on a node20.Crs_start Starts a particluar resource on a node
Thiru: ASM – basic things to look for: "ASM is quite a big topic so the basic things to look for are pretty much the same – in that you still set the ‘. oraenv’ for environment var..."
Killing a Hanging CRSD Daemon

If after careful diagnoses the option is to restart the CRS software you can following the next steps

crsctl stop crs

ps –ef egrep "crsdocssdevmdoprocdracg"

there are still crsd.bin and racg processes around….

Kill –9 crsd.bin

Wait a couple of minutes and then reissue the crsctl stop crs command

ps –ef egrep "crsdocssdevmdoprocdracg" – should show that the processes have died.
Thiru: RMAN Checks: "RMAN Checks During a particluar fault diagnoses it is prudent to quickly check of the database has a back and status of the backup… first ..."
ASM is quite a big topic so the basic things to look for are pretty much the same – in that you still set the ‘. oraenv’ for environment variables. The log files are still under the $ORACLE_BASE/admin/+ASM/bdump etc. but there are other things to take into consideration

The DBA maybe called out because there needs to be a disk added or there maybe errors – ASM in 10g is a single point of failure so if ASM goes down on the node all instances on the node are gone as well… more over if ASM cannot mount a diskgroup then all instances and databases on the node will go down meaning a full outage!!! So understand the problems you can have and how to find your way around quickly…
What generally happens when adding disks is the following –

1.The SAN guy will present the disks

2. The unix person will then run config manager down the fiber nad this will present hdisks from lspv
3. The unix guy will then need to do a mknod c of the hdisks, which will create the ‘/dev/asm’ name

4. The unix guy will then initialise the header and remove the IBM header using the dd if=,something> of= command

5.Once this is all done then and only then will you be able to see the disks in ASM
once this is all done then and only then will you be able to see the disks in ASM

List disk (lspv and SQLPLUS asm views) and adding disks
lspv lists the disks at an OS point of view… the disks must be here before they can be added to ASM

p13504xxxx20-> lspv
hdisk0 00cfbdad335d716f rootvg active

hdisk1 00cfbdad41c6b4e5 rootvg active

hdisk2 00cfbdad57a03fc9 lo_oracvg_301 active

hdisk3 00cfbdad7bc7c818 lo_oracvg_301 active

hdisk4 none None

hdisk5 none None

hdisk6 none None

hdisk7 none None

hdisk8 none None

You can then query the v$asm_disk view in asm to see if any disks are ready to be added i.e. they are ‘candidates’

select group_number, state, name, path, header_status from v$asm_disk
SQL> /

GROUP_NUMBER STATE NAME PATH HEADER_STATU
------------ -------- --------------- ------------------------- ------------
0 NORMAL /dev/p2_asmdisk_20 CANDIDATE

0 NORMAL /dev/p2_asmdisk_19 CANDIDATE

0 NORMAL /dev/p3_asmdisk_14 CANDIDATE

0 NORMAL /dev/p3_asmdisk_13 CANDIDATE

0 NORMAL /dev/p3_asmdisk_12 CANDIDATE

0 NORMAL /dev/p3_asmdisk_11 CANDIDATE

0 NORMAL /dev/p3_asmdisk_21 CANDIDATE

0 NORMAL /dev/p3_asmdisk_20 CANDIDATE

0 NORMAL /dev/p3_asmdisk_19 CANDIDATE

0 NORMAL /dev/p3_asmdisk_18 CANDIDATE

0 NORMAL /dev/p3_asmdisk_17 CANDIDATE

0 NORMAL /dev/p3_asmdisk_16 CANDIDATE

0 NORMAL /dev/p3_asmdisk_15 CANDIDATE

0 NORMAL /dev/p2_asmdisk_11 CANDIDATE

0 NORMAL /dev/p2_asmdisk_12 CANDIDATE

0 NORMAL /dev/p2_asmdisk_13 CANDIDATE

0 NORMAL /dev/p2_asmdisk_14 CANDIDATE

0 NORMAL /dev/p2_asmdisk_21 CANDIDATE

0 NORMAL /dev/p2_asmdisk_15 CANDIDATE

0 NORMAL /dev/p2_asmdisk_16 CANDIDATE

0 NORMAL /dev/p2_asmdisk_17 CANDIDATE

0 NORMAL /dev/p2_asmdisk_18 CANDIDATE

1 NORMAL DATADG_0005 /dev/p2_asmdisk_1 MEMBER

1 NORMAL DATADG_0004 /dev/p2_asmdisk_2 MEMBER

2 NORMAL FLASHDG_0004 /dev/p2_asmdisk_3 MEMBER

2 NORMAL FLASHDG_0005 /dev/p2_asmdisk_4 MEMBER

1 NORMAL DATADG_0001 /dev/p2_asmdisk_5 MEMBER

1 NORMAL DATADG_0002 /dev/p2_asmdisk_6 MEMBER

1 NORMAL DATADG_0009 /dev/p2_asmdisk_7 MEMBER

1 NORMAL DATADG_0013 /dev/p2_asmdisk_10 MEMBER

1 NORMAL DATADG_0015 /dev/p2_asmdisk_9 MEMBER

1 NORMAL DATADG_0014 /dev/p2_asmdisk_8 MEMBER

1 NORMAL DATADG_0000 /dev/p3_asmdisk_1 MEMBER

1 NORMAL DATADG_0003 /dev/p3_asmdisk_2 MEMBER

2 NORMAL FLASHDG_0000 /dev/p3_asmdisk_3 MEMBER

2 NORMAL FLASHDG_0003 /dev/p3_asmdisk_4 MEMBER

1 NORMAL DATADG_0006 /dev/p3_asmdisk_5 MEMBER

1 NORMAL DATADG_0007 /dev/p3_asmdisk_6 MEMBER

1 NORMAL DATADG_0008 /dev/p3_asmdisk_7 MEMBER

1 NORMAL DATADG_0011 /dev/p3_asmdisk_8 MEMBER

1 NORMAL DATADG_0012 /dev/p3_asmdisk_9 MEMBER

1 NORMAL DATADG_0010 /dev/p3_asmdisk_10 MEMBER

42 rows selected.


Add disks into ASM – dependant upon disks being presented first

Once the this disks are there as shown in the above outputs from v$asm_disk…


Alter diskgroup add disk ‘’;


Then from the v$asm_operation view you can see how long the rebalance will take – this can be speeded up using the power supplement…


select * from v$asm_operation;
Thiru: RMAN Checks: "RMAN Checks During a particluar fault diagnoses it is prudent to quickly check of the database has a back and status of the backup… first ..."
RMAN Checks

During a particluar fault diagnoses it is prudent to quickly check of the database has a back and status of the backup… first set the environment variables to the database in question (Follow .oraenv section) and the following commands listed will quickly give a synopsis of the backup status.

Connect to the target and catalog. Make sure that NLS_LANG and NLS_DATE_FORMAT is set in the rman environment when rman is run and that NLS_DATE_FORMAT includes timestamp (HH24:MI:SS).


$ rman target / catalog user/name@catalog

RMAN>list backupset summary;

1. The info about db and the backups in the catalog database, connected as rman/backup user and db incarnation set to the db in question (put this output in file list_backup.txt):

RMAN>LIST INCARNATION;

RMAN>LIST BACKUP OF DATABASE SUMMARY;

RMAN>LIST BACKUP;

RMAN>LIST BACKUP BY FILE;


2. The info about archive logs in the backups:

RMAN>LIST ARCHIVELOG ALL;

RMAN>LIST COPY OF DATABASE ARCHIVELOG ALL;

RMAN>LIST BACKUP OF ARCHIVELOG ALL;

RMAN>LIST BACKUP OF ARCHIVELOG FROM SEQUENCE 688321 UNTIL 688363
THREAD 1;

3. The info from the RMAN views in the catalog:

SQL> select * from RC_RESYNC;

SQL>select db_key, dbinc_key, dbid, name from RC_DATABASE;

SQL>select * from RC_BACKUP_REDOLOG;

SQL>select * from RC_ARCHIVED_LOG;


4. The info from the target db:


SQL>select RECID, NAME, SEQUENCE#, ARCHIVED, DELETED, STATUS


5. Block corruptions

Database:

SQL> select * from V$database_block_corruption;

Catalog:

SQL>select * from rc_database_block_corruption;

5. Checking the datafiles status

SQL>select name, open_mode from v$database;

SQL>select * from v$recover_file;

SQL>select name, status from v$datafile;
Thiru: Chang Archive log destination: "ALTER SYSTEM LOG ARCHIVE START TO 'new_location'; Ex:- Step 1 : alter system archive log start to '/ora_arch_tmp/xxxxx/archive'; Step 2 :..."
Thiru: Thirupal_Boreddy_oracle_DBA: ENQUEUE LOCK:: "Thirupal_Boreddy_oracle_DBA: ENQUEUE LOCK: : 'ENQUEUE LOCK: set linesize 200 col USERNAME for a20 col MODULE for a30 col s.SID for 99999999 ..."
Thiru: To monitor job progress[RMAN]: "SELECT SID, SERIAL#, CONTEXT, SOFAR, TOTALWORK, ROUND (SOFAR/TOTALWORK*100, 2) '%_COMPLETE' FROM V$SESSION_LONGOPS WHERE OPNAME LIKE 'RMAN%'..."
Thiru: Manual Archive Backup: "Step 1: RMAN> connect target / connected to target database: xxxxxx (DBID=256880360) Step 2: RMAN> connect catalog rman/rm2n10g@(DESCRIPTION..."
Step 1:
RMAN> connect target /
connected to target database: xxxxxx (DBID=256880360)
Step 2:
RMAN> connect catalog rman/rm2n10g@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxxxx.xxxxx.co.uk)(PORT=1523)))(CONNECT_DATA=(SERVICE_NAME=xxxxx.xxxxx.co.uk)))
connected to recovery catalog database
Step 3:
run {allocate channel t1 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/u01/app/oracle/admin/tsm_rman/xxxxxx_tdpo.opt)'

backup archivelog all delete input;
}
Thiru: Chang Archive log destination: "ALTER SYSTEM LOG ARCHIVE START TO 'new_location'; Ex:- Step 1 : alter system archive log start to '/ora_arch_tmp/xxxxx/archive'; Step 2 :..."
ALTER SYSTEM LOG ARCHIVE START TO 'new_location';

Ex:-

Step 1 : alter system archive log start to '/ora_arch_tmp/xxxxx/archive';

Step 2 : Alter system switch logfile;
-----------------------------------------------------
ALTER SYSTEM SET log_archive_dest='location=/data/ORCL/oracle/CRL1/temp_pxc1' SCOPE=spfile;

ALTER SYSTEM SET log_archive_dest='location=/opt/ORCL/apps/best1/temparc' SCOPE=memory;

SQL> select name from v$database;

NAME
---------
orcl

SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /data/ORCL/oracle/CRL1
Oldest online log sequence 496885
Next log sequence to archive 496889
Current log sequence 496889
Thiru: Process for clearing down archivelogs via Rman: "Process for clearing down archivelogs via Rman The only way archive logs should be deleted is through rman, as follows. Ensure that a back..."
Add space in RAC DB TABLESPACES:

alter tablespace CSB_DATA add datafile '+DATADG’ size 1024M autoextend on next 128m maxsize 5120M;
Thiru: Converting to archive log mode in RAC: "Converting to archive log mode in RAC SQL> alter system set cluster_database=false scope=spfile sid='prod1'; srvctl stop database -d SQL..."
Thiru: Steps for VIP Relocation

Steps for VIP Relocation:

{p21406thiru243}/home/oracle$ exit

{p21406thiru243}/root$ cd

{p21406thiru243}/root$ . oraenv

ORACLE_SID = [root] ? crs

{p21406thiru243}/root$ crs_stat grep -i vip

NAME=ora.p21406thiru243.vip

NAME=ora.p21506thiru244.vip

NAME=ora.p21606thiru251.vip

NAME=ora.p21706thiru252.vip

{p21406thiru243}/root$ crs_relocate ora.p21406thiru243.vip -c p21406thiru243

Attempting to stop `ora.p21406thiru243.vip` on member `p21506thiru244`

Stop of `ora.p21406thiru243.vip` on member `p21506thiru244` succeeded.

Attempting to start `ora.p21406thiru243.vip` on member `p21406thiru243`

Start of `ora.p21406thiru243.vip` on member `p21406thiru243` succeeded.
Thiru: Converting to archive log mode in RAC: "Converting to archive log mode in RAC SQL> alter system set cluster_database=false scope=spfile sid='prod1'; srvctl stop database -d SQL..."
Converting to archive log mode in RAC

SQL> alter system set cluster_database=false scope=spfile sid='prod1';

srvctl stop database -d

SQL> startup mount

SQL> alter database archivelog;

SQL> alter system set cluster_database=true scope=spfile sid='prod1';

SQL> shutdown;srvctl start database -d prod
Thiru: Process for clearing down archivelogs via Rman: "Process for clearing down archivelogs via Rman The only way archive logs should be deleted is through rman, as follows. Ensure that a back..."
Process for clearing down archivelogs via Rman

The only way archive logs should be deleted is through rman, as follows.

Ensure that a backup isn't running at present

ps -ef grep bko - (or 'grep rman')

rman target / catalog or /var/opt/oracle/oratab_options)>

crosscheck archivelog all;

delete archivelog all backed up 2 times to sbt;

If this does not clear enough space, the 2 can be reduced to 1.

If more space needs to be reclaimed, then tha following can be used.

delete archivelog until time 'sysdate - 3';

Again the 3 can be reduced to clear more space. Note that using this method is a last resort, as it

deletes archive logs which have not been backed up to tape.
Thirupal_Boreddy_oracle_DBA: crs commands: "crsctl check crs ocrcheck"
crsctl check crs
ocrcheck
crsctl check crs
ocrcheck
Thirupal_Boreddy_oracle_DBA: OMS Commands: "OMS:/u01/app/oracle/product/10.2.0/oms10g:N . oraenv OMS {p06702swop03}/u01/app/oracle/product/10.2.0/oms10g/opmn/bin$ pwd /u01/app/oracl..."
OMS:/u01/app/oracle/product/10.2.0/oms10g:N

. oraenv

OMS
{p06702swop03}/u01/app/oracle/product/10.2.0/oms10g/opmn/bin$ pwd

/u01/app/oracle/product/10.2.0/oms10g/opmn/bin

{p06702swop03}/u01/app/oracle/product/10.2.0/oms10g/opmn/bin$ ls -ltr *ctl*

-rwx------ 1 oracle oinstall 23138 Sep 29 2005 opmnctl.tmp

-rwx------ 1 oracle oinstall 23219 Mar 16 2007 opmnctl

{p06702swop03}/u01/app/oracle/product/10.2.0/oms10g/opmn/bin$ opmnctl status
Processes in Instance: EnterpriseManager0.p06702swop03
-------------------+--------------------+---------+---------
ias-component process-type pid status

-------------------+--------------------+---------+---------

DSA DSA N/A Down

HTTP_Server HTTP_Server 393242 Alive

LogLoader logloaderd N/A Down

dcm-daemon dcm-daemon N/A Down

OC4J home 614448 Alive

OC4J OC4J_EMPROV 192700 Alive

OC4J OC4J_EM 1609870 Alive

WebCache WebCache 258180 Alive

WebCache WebCacheAdmin 254082 Alive

opmnctl status

opmnctl stopall;

opmnctl startall;
Thirupal_Boreddy_oracle_DBA: systemstate and hang analyse: "1. systemstate $ sqlplus '/ as sysdba'oradebug setmypidoradebug unlimitoradebug dump systemstate 266wait 90 secondsoradebug dump systemsta..."
Thirupal_Boreddy_oracle_DBA: HSMP cluster commands:: "HSMP cluster commands: Smitty hacmp /usr/es/sbin/cluster/clstat -a"
HSMP cluster commands:
Smitty hacmp
/usr/es/sbin/cluster/clstat -a
Static Parameter

AUDIT_FILE_DEST = 'directory' Directory in which auditing files are to reside AUDIT_SYS_OPERATIONS = {TRUEFALSE} AUDIT_TRAIL = {NONE FALSE DB TRUE OS} Enable system auditing 9i
Thirupal_Boreddy_oracle_DBA: systemstate and hang analyse: "1. systemstate $ sqlplus '/ as sysdba'oradebug setmypidoradebug unlimitoradebug dump systemstate 266wait 90 secondsoradebug dump systemsta..."
SQL> set echo on

SQL> set pagesize 1000

SQL> set linesize 150

SQL> alter session set nls_date_format='mm/dd/yy hh24:mi:ss';
Thirupal_Boreddy_oracle_DBA: systemstate and hang analyse: "1. systemstate $ sqlplus '/ as sysdba'oradebug setmypidoradebug unlimitoradebug dump systemstate 266wait 90 secondsoradebug dump systemsta..."
set heading off;
set echo off;
Set pages 999;
set long 90000;

spool ddl_list.sql

select dbms_metadata.get_ddl('TABLE','DEPT','SCOTT') from dual;
select dbms_metadata.get_ddl('INDEX','DEPT_IDX','SCOTT') from dual;

spool off;
Thirupal_Boreddy_oracle_DBA: Procedure to relocate services in RAC nodes: "Procedure to relocate services in RAC nodes srvctl relocate service -d RTTAPP4P -s WAS_RTTAPP4P_DCF_01 -i RTTAPP4P2 -t RTTAPP4P1 Check th..."
1. Change SID to the database you want to register
. oraenvORACLE_SID

2. Connect to RMAN catalog database
rman target / catalog username/password@MYDB

3. Register database
RMAN> register database;
****************************************************
p20614prwxxx:orapbw> . oraenv
ORACLE_SID = [xxx] ?
p20614prw545:orapbw> rman
Recovery Manager: Release 10.2.0.4.0 - Production on Mon Mar 14 11:18:36 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
RMAN> connect target /
connected to target database: xxx (DBID=1836386560)
RMAN> connect catalog rman/rm2n10g@Connect String/database Name
connected to recovery catalog database
RMAN> register database;
database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

select * from V$RECOVERY_FILE_DEST;

SQL> select * from V$FLASH_RECOVERY_AREA_USAGE;

FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ---------------
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 44.92 0 179
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 0 0 0

6 rows selected.

ALTER SYSTEM set DB_RECOVERY_FILE_DEST_SIZE = 15G;


Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_oracle_DBA: ENQUEUE LOCK:: "Thirupal_Boreddy_oracle_DBA: ENQUEUE LOCK: : 'ENQUEUE LOCK: set linesize 200 col USERNAME for a20 col MODULE for a30 col s.SID for 99999999 ..."
ENQUEUE LOCK:

set linesize 200 col USERNAME for a20 col MODULE for a30 col s.SID for 99999999 SELECT s.SID, USERNAME AS "User", PROGRAM, MODULE, ACTION, LOGON_TIME "Logon", l.* FROM V$SESSION s, V$ENQUEUE_LOCK l WHERE l.SID = s.SID AND l.TYPE = 'CF' AND l.ID1 = 0 AND l.ID2 = 2;
Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_orac...: "Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_oracle_DBA: oracle RAC Commands : 'Thirupal_Boreddy_oracle_DBA: oracle RAC Commands : 'There a..."
1. Finding Volume group

{p0104crmp05}/home/sdtsldba$ lsvg
rootvg
sh_crmp05_vg2
hb_crmp05_vg1
hb_crmp06_vg1
sh_tmspdb_203
altinst_rootvg

2. Finding the file system in which volume group

{p0104crmp05}/home/sdtsldba$ lsvg -l sh_tmspdb_203

3. Size of the file system

{p01102sbap03}/home/oracle$ lslv ssp_jx_data17
LOGICAL VOLUME: ssp_jx_data17 VOLUME GROUP: orascanlive_206
LV IDENTIFIER: 0031f1df00004c00000001122e2daed5.25 PERMISSION: read/write
VG STATE: active/complete LV STATE: opened/syncd
TYPE: raw WRITE VERIFY: off
MAX LPs: 512 PP SIZE: 128 megabyte(s)
COPIES: 2 SCHED POLICY: parallel
LPs: 112 PPs: 224
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: minimum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 2
MOUNT POINT: N/A LABEL: None
MIRROR WRITE CONSISTENCY: off
EACH LP COPY ON A SEPARATE PV?: yes (superstrict)
Serialize IO? NO
Size: SSP IDX Actual 112x128= 14336M
Thirupal_Boreddy_oracle_DBA: systemstate and hang analyse: "1. systemstate $ sqlplus '/ as sysdba'oradebug setmypidoradebug unlimitoradebug dump systemstate 266wait 90 secondsoradebug dump systemsta..."
1. systemstate


$ sqlplus '/ as sysdba'oradebug setmypidoradebug unlimitoradebug dump systemstate 266wait 90 secondsoradebug dump systemstate 266wait 90 secondsoradebug dump systemstate 266quit
need this doing on all nodes

2. hang analyse in a new session

sqlplus / as sysdba
oradebug setmypidoradebug unlimitoradebug tracefile_nameoradebug -g all hanganalyze 3... wait a minute or two here...oradebug -g all hanganalyze 3... wait a minute or two here...oradebug -g all hanganalyze 3
exit

2. Execute the following command every 90 seconds for 3 times


  • oradebug setmypid;

  • oradebug unlimit;

  • oradebug hanganalyze 3;
3. Collect Systemstate dumps running the following script every 90 seconds for 3 times


  • oradebug setmypid;

  • oradebug unlimit;

  • oradebug tracefile_name

  • oradebug dump systemstate 266;
SELECT SID, SERIAL#, CONTEXT, SOFAR, TOTALWORK,ROUND (SOFAR/TOTALWORK*100, 2) "%_COMPLETE"FROM V$SESSION_LONGOPSWHERE OPNAME LIKE 'RMAN%'AND OPNAME NOT LIKE '%aggregate%'AND TOTALWORK != 0AND SOFAR <> TOTALWORK;
Thirupal_Boreddy_oracle_DBA: Thirupal_Boreddy_oracle_DBA: oracle RAC Commands: "Thirupal_Boreddy_oracle_DBA: oracle RAC Commands : 'There are certain checks that can be quickly performed to check the health of CRS crsctl..."
Starts the Management Agent

emctl start agent

Stops the Management Agent

emctl stop agent

If the Management Agent is running, this command displays status information about the Management Agent, including the Agent Home, the process ID, and the time and date of the last successful upload to the Management Repository ().


emctl status agent
There are certain checks that can be quickly performed to check the health of CRS

crsctl check crs

CSS appears healthy
CRS appears healthy
EVM appears healthy

QUERY FOR CPU USAGE

select  a.target_name as HOST, to_char(a.rollup_timestamp,' dd-Mon-yy::hh24:mi') as "DAY-TIME", sum(c.cpu_count) as ...