troubleshooting root.sh problem for oracle 10g/11g


		troubleshooting root.sh problem for oracle 10g/11g
 
在安装RAC中的Clusterware部分，最后运行root.sh的时候，
可以tail住$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<hostname>.log文件，
该文件详细记录了运行root.sh过程中的集群配置过程，如遇到错误，也能得到比较详细的信息
 
troubleshooting root.sh problem
------*for 10g and 11.1
1.查证公网，私网的节点名是可以互相ping通的
2.---查证OCR/Voting 文件对oracle 、root用户都是可以读写的，盘的权限
    Dd if=/dev/raw/raw1 of=/dev/null   --验证可读
Pre Install:
 OCR    - root:oinstall   - 640
 Voting - oracle:oinstall - 660
Post Install:
 OCR    - root:oinstall   - 640
 Voting - oracle:oinstall - 644
In RHAS 4.0, permissions should be added to /etc/rc.d/rc.local.  See 
Note 293819.1 for more information.
3.在执行root.sh前确保ocr、voting盘是干净的--(第一次安装时)
Example:清理磁盘头
 dd if=/dev/zero of=/dev/traindata_dg/ocrV1064_100m.dbf bs=8192  count=12800
 dd if=/dev/zero of=/dev/traindata_dg/V1064_vote_01_20m.dbf bs=8192  count=2560
 
4.Verify that the Oracle user has permissions on /var/tmp  (specifically/var/tmp/.oracle)
5.Is pam being used?  Look for pam_unix messages in the messages  file.  The pam configuration might need to be altered to allow the  root.sh to complete.   ---pam_unix ：传统的密码验证模块
6.Verify that the correct vendor clusterware version is being used  (if vendor clusterware is being used).  If on Sun, make sure you are  using the latest UDLM.
If on Sun, make sure the udlm has the keyword "reentrant".  Example:
 > more /var/sadm/pkg/ORCLudlm/pkginfo | grep VERSION
 VERSION=Dev Release 10/29/03, 64bit 3.3.4.7 reentrant
7. Veirfy that crs, css, or evm is not already running ( ps -ef |  grep d.bin )
 
------debug root.sh------
1. crsctl stop crs  (root用户)
2. Backup the entire Oracle Clusterware home.
3.Execute <CRS_HOME>/install/rootdelete.sh on all nodes
4. Execute <CRS_HOME>/install/rootdeinstall.sh on the installing node
5. The following commands should return nothing: 
* ps -e | grep -i 'ocs[s]d' 
* ps -e | grep -i 'cr[s]d.bin' 
* ps -e | grep -i 'ev[m]d.bin' 
Eventually kill those processes or reboot the node.
6. Remove all files from /tmp/.oracle and /var/tmp/.oracle
7. edit the root.sh and add 'sh -x' before the two commands executed  by it, e.g.
#!/bin/sh
sh -x /u01/app/oracle/product/crs102/install/rootinstall
sh -x /u01/app/oracle/product/crs102/install/rootconfig
8. collect the output via, e.g.
script /tmp/rootsh-node1.log 
./root.sh 
exit
9. Please send the rootsh-node1.log to Oracle Support for analyzing.
- In some cases, these messages can be found in the rootsh- <node_name>.log file:
 
---------------------------------------
诊断oracle 11g root.sh issue
---------------------------
At the end of a grid infrastructure installation, the user is prompted to run the "root.sh" script.  This script configures and starts the Oracle Clusterware stack.  A root.sh script can error out and/or fail under one of the following conditions:
· Problem with the network configuration.
· Problem with the storage location for the OCR and/or voting files.  
· Permission problem with/var/tmp (specifically /var/tmp/.oracle).
· Problem with the vendor clusterware (if used).
· Some other configuration issue.
· An Oracle bug.
Most configuration issues should be detectable by running the Cluster Verification Utility with the following syntax (input the nodelist):
cd <GRID_HOME>/bin
./cluvfy stage -pre crsinst -n <nodelist> -r 11gR2 -verbose
Additional options can be used for a more thorough check:
USAGE:
cluvfy stage -pre crsinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}]
[-c <ocr_location_list>] [-q <voting_disk_list>]
[-osdba <osdba_group>]
[-orainv <orainventory_group>]
[-asm -asmgrp <asmadmin_group>]
[-asm -asmdev <asm_device_list>]
[-fixup [-fixupdir <fixup_dir>]] [-verbose]
If the Cluster Verification Utility is unable to find a configuration problem and your root.sh still fails, you may need the assistance of Oracle Support to troubleshoot further and/or see the "Advanced Root.sh Troubleshooting" section:
Advanced Root.sh Troubleshooting
The root.sh is simply a parent script that calls the following scripts:
<GRID_HOME>/install/utl/rootmacro.sh       # small - validates home and user
<GRID_HOME>/install/utl/rootinstall.sh     # small - creates some local files
<GRID_HOME>/network/install/sqlnet/setowner.sh   # small - opens up /tmp permissions
<GRID_HOME>/rdbms/install/rootadd_rdbms.sh  # small - misc file/permission checks
<GRID_HOME>/rdbms/install/rootadd_filemap.sh  # small - misc file/permission checks
<GRID_HOME>/crs/install/rootcrs.pl  # MAIN CLUSTERWARE CONFIG SCRIPT
If your root.sh is failing on one of the first 5 scripts, it should be an easy fix since those fix are small and easy to troubleshoot.  However, most problems are likely going to happen in the rootcrs.pl script which is the main clusterware config script.  This script will log useful trace data to <GRID_HOME>/cfgtoollogs/crsconfig/rootcrs_<nodename>.log.  However, you should check the clusterware alert log under <GRID_HOME>/log/<nodename> first for any obvious problems or errors.