Friday, September 20, 2013

Upgrade from RAC 11.2.0.1 to 11.2.0.3 (Part II OS preparation, Grid Infrastructure and Database software installation)

In Part I I've installed the Linux OS and prepared the shared filesystem (ISCSI configuration & OCFS2)
In this part I'll prepare the Linux OS for the RAC installation, install Grid Infrastructure 11.2.0.3 and install Database software.

Packages Requirements: (install the same packages version or later)
==================
OEL 5 Required packages for 11.2.0.2 and later versions:
--------------------------------------------------------------------
rpm -qa | grep binutils-2.17.50.0.6
rpm -qa | grep compat-libstdc++-33-3.2.3
rpm -qa | grep elfutils-libelf-0.1
rpm -qa | grep elfutils-libelf-devel-0.1
rpm -qa | grep gcc-4.1.2
rpm -qa | grep gcc-c++-4.1.2
rpm -qa | grep glibc-2.5
rpm -qa | grep glibc-common-2.5
rpm -qa | grep glibc-devel-2.5
rpm -qa | grep glibc-headers-2.5
rpm -qa | grep ksh-2
rpm -qa | grep libaio-0.3.106
rpm -qa | grep libaio-devel-0.3.106
rpm -qa | grep libgcc-4.1.2
rpm -qa | grep libstdc++-4.1.2
rpm -qa | grep libstdc++-devel-4.1.2
rpm -qa | grep make-3.81
rpm -qa | grep sysstat-7.0.2
rpm -qa | grep unixODBC-2.2.11       #=> (32-bit) or later
rpm -qa | grep unixODBC-devel-2.2.11 #=> (64-bit) or later
rpm -qa | grep unixODBC-2.2.11       #=> (64-bit) or later

In case you have missing packages try installing them from the Linux installation DVD:
e.g.
cd /media/OL5.9\ x86_64\ dvd\ 20130429/Server/
rpm -ivh numactl-devel-0.9.8-12.0.1.el5_6.i386.rpm

The most easiest way to download & install 11gr2 required packages & OS settings is to install oracle-rdbms-server-11gR2-preinstall package:

Make sure to install Oracle on a NON Tainted Kernel:
-------------------------------------------------------------
What does Tainted Kernel mean:
 -A special module has changed the kernel.
 -That module has been force loaded by insmod -f
 -Successful (Oracle installation, Oracle support for that database and Oracle support for Linux) will depend on the module that tainted the kernel.
Oracle Support may not support your system (Linux, database) if there is a main module in the kernel has been tainted,.
How to check if the kernel is tainted or not:
# cat /proc/sys/kernel/tainted
1
If the output is 1 ,the kernel is tainted, you have to contact Oracle Support asking their help whether to proceed with oracle installation or not.
if the output is 0 ,the kernel is not tainted, you're good to go to install oracle software.

Network Requirements:
=================
> Each node must have at least two NICs.
> Recommended to use NIC bonding for public NIC, use HAIP for private NIC (11.2.0.2 onwards).
> Recommended to use redundant switches along with NIC bonding.
> Public & private interface names must be identical on all nodes (e.g. eth0 is the public NIC on all nodes).
> Crossover cables between private RAC NICs are NOT supported (gigabit switch is the minimum requirement). Crossover cables limits the expansion of RAC to two nodes, bad performance due to excess packets collision and cause unstable negotiation between the NICs.
> Public NICs and VIPs / SCAN VIPs must be on the same subnet. Private NICs must be on a different subnet.
> For private interconnect use non-routable addresses:
   [From 10.0.0.0    to  10.255.255.255 or
    From 172.16.0.0  to  172.31.255.255 or
    From 192.168.0.0 to  192.168.255.255]
> Default GATEWAY must be on the same Public | VIPs | SCAN VIPs subnet.
> If you will use SCAN VIP, SCAN name recommended to resolve via DNS to a minimum 3 IP addresses.
> /etc/hosts or DNS must include PUBLIC & VIP IPs with the host names.
> SCAN IPs should not be in /etc/hosts. People not willing to use SCAN can do so, just to let the Grid Infrastructure installation succeed.
> NIC names must NOT include DOT "."
> Every node in the cluster must be able to connect to every private NIC in each node.
> Host names for nodes must NOT have underscores (_).
> Linux Firewall (iptables) must be disabled at least on the private network. If you will enable the firewall I recommend to disable it till you finish the installation of all Oracle products to easily troubleshoot installation problems once you finish you can enable the firewall then feel free to blame the firewall if something didn't work :-).
> Oracle recommend to disable Network zero conf: (as it causing node eviction)
  # route -n  => If found line 169.254.0.0 this means zero conf is enabled on your OS (default), next step is to disable it by doing the following:
  # vi /etc/sysconfig/network
  #Add this line:
  NOZEROCONF=yes
Restart the network:
  # service network restart
> Recommended to use JUMBO frames for interconnect: [Note: 341788.1]
  Warning: although it's available in most network devices it's not supported by some NICs (specially Intel NICs) & switches, JUMBO frames should be enabled on the interconnect switch device(Doing a test is mandatory)
  # suppose that eth3 is your interconnect NIC:
  # vi /etc/sysconfig/network-scripts/ifcfg-eth3
  #Add the following parameter:
  MTU=9000
  # ifdown eth3; ifup eth3
  # ifconfig -a eth3  => you will see the value of MTU=9000 (The default MTU is 1500)
  Testing JUMBO frames using traceroute command: (during the test, we shouldn't see in the output something like "Message too long":
  =>From Node1:
  # traceroute -F node2-priv 8970
    traceroute to n2-priv (192.168.110.2), 30 hops max, 9000 byte packets
    1  node2-priv (192.168.110.2)  0.269 ms  0.238 ms  0.226 ms
  =>This test was OK
  =>In case you got this message "Message too long" try to reduce the MTU untill this message stop appear.
  Testing JUMBO frames using ping: (With MTU=9000 test with 8970 bytes not more)
  =>From Node1:
  # ping -c 2 -M do -s 8970 node2-priv
    1480 bytes from node2-priv (192.168.110.2): icmp_seq=0 ttl=64 time=0.245 ms
  =>This test was OK.
  =>In case you got this message "Frag needed and DF set (mtu = 9000)" reduce the MTU till you get the previous output.
> Stop avahi-daemon: recommended by Oracle, it causes node eviction plus failing the node to to re-join to the cluster [Note: 1501093.1]
  # service avahi-daemon stop
  # chkconfig avahi-daemon off

Create new Grid & Oracle home:
========================
mkdir -p /u01/grid/11.2.0.3/grid
mkdir -p /u01/oracle/11.2.0.3/db
chown -R oracle:dba /u01
chown oracle:oinstall /u01
chmod 700 /u01
chmod 750 /u01/oracle/11.2.0.3/db

Note: Oracle user, DBA and OINSTALL groups are created during Oracle Enterprise Linux installation.
Note: I'll install Grid & Oracle with oracle user, I'll not create a new user to be the grid installation owner.

Adding environment variables to Oracle profile:
-----------------------------------------------------
I'm using too much command aliases inside oracle user profile to speed up my administration work, I think it may be helpful for you too. also some aliases refer to some helpful shell scripts like checking the locker session on the DB and more I'll share it with you later in future posts.

# su - oracle
# vi .bash_profile  


# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
 . ~/.bashrc
fi
if [ -t 0 ]; then
   stty intr ^C
fi

umask 022
# User specific environment and startup programs
unset USERNAME
ORACLE_SID=pefms1
export ORACLE_SID
ORACLE_BASE=/u01/oracle
export ORACLE_BASE
ORACLE_HOME=/u01/oracle/11.2.0.3/db; export ORACLE_HOME
GRID_HOME=/u01/grid/11.2.0.3/grid
export GRID_HOME
LD_LIBRARY_PATH=$ORACLE_HOME/lib; export LD_LIBRARY_PATH
TNS_ADMIN=$ORACLE_HOME/network/admin
export TNS_ADMIN
CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib
CLASSPATH=$CLASSPATH:$ORACLE_HOME/network/jlib; export CLASSPATH
TMP=/tmp; export TMP
TMPDIR=$TMP; export TMPDIR
PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin:$GRID_HOME/bin:/usr/ccs/bin:/usr/bin/X11/:/usr/local/bin:$ORACLE_HOME/OPatch
export PATH
export ORACLE_UNQNAME=pefms
export ORACLE_HOSTNAME=ora1123-node1
alias profile='cd;. ./.bash_profile;cd -'
alias viprofile='cd; vi .bash_profile'
alias catprofile='cd; cat .bash_profile'
alias tnsping='$ORACLE_HOME/bin/./tnsping'
alias pefms='export ORACLE_SID=pefms; echo $ORACLE_SID'
alias sql="sqlplus '/ as sysdba'"
alias alert="tail -100f $ORACLE_HOME/diagnostics/pefms/diag/rdbms/pefms/pefms1/trace/alert_pefms1.log"
alias vialert="vi $ORACLE_HOME/diagnostics/pefms/diag/rdbms/pefms/pefms1/trace/alert_pefms1.log"
alias lis="vi $ORACLE_HOME/network/admin/listener.ora"
alias tns="vi $ORACLE_HOME/network/admin/tnsnames.ora"
alias sqlnet="vi $ORACLE_HOME/network/admin/sqlnet.ora"
alias sqlnetlog='vi $ORACLE_HOME/log/diag/clients/user_oracle/host_2245657081_76/trace/sqlnet.log'
alias network=" cd $ORACLE_HOME/network/admin;ls -rtlh;pwd"
alias arc="cd /ora_archive1/pefms/; ls -rtlh|tail -50;pwd"
alias p="ps -ef|grep pmon|grep -v grep"
alias oh="cd $ORACLE_HOME;ls;pwd"
alias dbs="cd $ORACLE_HOME/dbs;ls -rtlh;pwd"
alias pfile="vi $ORACLE_HOME/dbs/initpefms1.ora"
alias catpfile="cat $ORACLE_HOME/dbs/initpefms1.ora"
alias spfile="cd /fiber_ocfs_pefms_data_1/oracle/pefms; cat spfilepefms1.ora"
alias bdump='cd $ORACLE_HOME/diagnostics/pefms/diag/rdbms/pefms/pefms1/trace;ls -lrt|tail -10;pwd'
alias udump='cd $ORACLE_HOME/diagnostics/pefms/diag/rdbms/pefms/pefms1/trace;ls -lrt;pwd';
alias cdump='cd $ORACLE_HOME/diagnostics/pefms/diag/rdbms/pefms/pefms1/cdump;ls -lrt;pwd'
alias rman='cd $ORACLE_HOME/bin; ./rman target /'
alias listenerlog='tail -100f $ORACLE_BASE/diag/tnslsnr/ora1123-node1/listener/trace/listener.log'
alias vilistenerlog='vi $ORACLE_BASE/diag/tnslsnr/ora1123-node1/listener/trace/listener.log'
alias listenerpefms1log='tail -100f $ORACLE_HOME/log/diag/tnslsnr/ora1123-node1/listener_pefms1/trace/listener_pefms1.log '
alias listenerpefms2log='tail -100f $ORACLE_HOME/log/diag/tnslsnr/ora1123-node2/listener_pefms2/trace/listener_pefms2.log'
alias listenertail='tail -100f $ORACLE_BASE/diag/tnslsnr/ora1123-node1/listener/trace/listener.log'
alias cron='crontab -e'
alias crol='crontab -l'
alias df='df -h'
alias ll='ls -rtlh'
alias lla='ls -rtlha'
alias l='ls'
alias patrol='sh /home/oracle/patrol.sh'
alias datafiles='sh /home/oracle/db_size.sh'
alias locks='sh /home/oracle/locks.sh'
alias objects='sh /home/oracle/object_size.sh'
alias jobs='sh /home/oracle/jobs.sh'
alias crs='$GRID_HOME/bin/crsstat'
alias crss='crs|grep -v asm|grep -v acfs|grep -v gsd|grep -v oc4j|grep -v ora.cvu'
alias raclog='tail -100f $GRID_HOME/log/ora1123-node1/alertora1123-node1.log'
alias viraclog='vi $GRID_HOME/log/ora1123-node1/alertora1123-node1.log'
alias datafile='sh /home/oracle/db_size.sh'
alias invalid='sh /home/oracle/Invalid_objects.sh'
alias d='date'
alias dc='d;ssh n2 date'
alias aud='cd $ORACLE_HOME/rdbms/audit;ls -rtl|tail -200'
alias lastdb='/home/oracle/lastdb.sh'
alias sessions='/home/oracle/sessions.sh'
alias spid='sh /home/oracle/spid.sh'
alias spidd='sh /home/oracle/spid_full_details.sh'
alias session='/home/oracle/session.sh'
alias killsession='/home/oracle/kill_session.sh'
alias unlock='/home/oracle/unlock_user.sh'
alias sqlid='/home/oracle/sqlid.sh'
alias parm='/home/oracle/parm.sh'
alias grid='cd /u01/grid/11.2.0.3/grid; ls; pwd'
alias lsn='ps -ef|grep lsn|grep -v grep'

When adding the variables to Oracle profile in the other node you will change the node name from ora1123-node1 to ora1123-node2

Configure SYSTEM parameters:
========================

All parameters should be same or greater on the OS:
----------------------------------------------------
# /sbin/sysctl -a | grep sem           #=> semaphore parameters (250 32000 100 142).
# /sbin/sysctl -a | grep shm           #=> shmmax, shmall, shmmni (536870912, 2097152, 4096).
# /sbin/sysctl -a | grep file-max     #=> (6815744).
# /sbin/sysctl -a | grep ip_local_port_range  #=> Minimum: 9000, Maximum: 65500
# /sbin/sysctl -a | grep rmem_default  #=> (262144).
# /sbin/sysctl -a | grep rmem_max      #=> (4194304).
# /sbin/sysctl -a | grep wmem_default #=> (262144).
# /sbin/sysctl -a | grep wmem_max     #=> (1048576).
# /sbin/sysctl -a | grep aio-max-nr    #=> (Minimum: 1048576) limits concurrent requests to avoid I/O Failures.

Note:
If the current value of any parameter is higher than the value listed above, then do not change the value of that parameter.
If you will change any parameter on /etc/sysctl.conf then issue the command: sysctl -p

Check limit.conf values:
vi /etc/security/limits.conf  

oracle   soft   nofile    131072
oracle   hard   nofile    131072
oracle   soft   nproc    131072
oracle   hard   nproc    131072
oracle   soft   core    unlimited
oracle   hard   core    unlimited
oracle   soft   memlock    50000000
oracle   hard   memlock    50000000
# Adjust MAX stack size for 11.2.0.3 => Original was 8192:
oracle   soft   stack    10240 

After updating limits.conf file, oracle user should logoff & logon to let the new adjustments take effect.

Ensure mounting /usr in READ-WRITE mode:
------------------------------------------------
# mount -o remount,rw /usr

>For security reasons Sys admins prefer to mount /usr in READ ONLY mode, during Oracle installation /usr must be in RW mode.

Restart the internet services daemon (xinetd):
----------------------------------------------
# service xinetd restart

Edit the /etc/securetty file and append it with the relevant service name:
------------------------------------------------------------------------
ftp
rlogin
rsh
rexec
telnet

Create ".rhosts" file:
This file will provide user equivalence between the servers, should be create under Oracle user home:
su - oracle
cd
vi .rhosts
# Add the following lines
ora1123-node1 oracle
ora1123-node2 oracle
ora1123-node1-priv oracle
ora1123-node2-priv oracle
ora1123-node1-vip oracle
ora1123-node2-vip oracle

Create hosts.equiv file:
vi /etc/hosts.equiv
#add these lines:
ora1123-node1 oracle
ora1123-node2 oracle
ora1123-node1-priv oracle
ora1123-node2-priv oracle
ora1123-node1-vip  oracle
ora1123-node2-vip  oracle

chmod 600 /etc/hosts.equiv
chown root.root /etc/hosts.equiv

Configure Host equivalence between Nodes:
-----------------------------------------------
on Both Nodes:
----------------
mkdir -p cd /home/oracle/.ssh
cd /home/oracle/.ssh
ssh-keygen -t rsa
ssh-keygen -t dsa

cat id_rsa.pub > authorized_keys
cat id_dsa.pub >> authorized_keys

On Node1:
cd /home/oracle/.ssh
scp authorized_keys oracle@ora1123-node2:/home/oracle/.ssh/authorized_keys_nod1

on Node2:
cd /home/oracle/.ssh
mv authorized_keys_nod1 authorized_keys

cat id_rsa.pub >> authorized_keys
cat id_dsa.pub >> authorized_keys

Copy the authorized_keys file to Node1:
scp authorized_keys oracle@ora1123-node1:/home/oracle/.ssh/


From Node1: Answer each question with "yes"
ssh ora1123-node1 date
ssh ora1123-node2 date
ssh n1 date
ssh n2 date
ssh ora1123-node1-priv date
ssh ora1123-node2-priv date

From Node2: Answer each question with "yes"
ssh ora1123-node1 date
ssh ora1123-node2 date
ssh n1 date
ssh n2 date
ssh ora1123-node1-priv date
ssh ora1123-node2-priv date

Enable rsh on both Nodes:
------------------------------
First verify that rsh & rsh-server packages are installed

rpm -qa|grep rsh

rsh-server-0.17-40.el5
rsh-0.17-40.el5

If the packages are not installed install them:
you can find rsh package in CD1 under "Server" directory
you can find rsh-server package in CD3 under "Server" directory

Add rsh to PAM:
------------------
vi /etc/pam.d/rsh:
#Add the following line
auth sufficient pam_rhosts_auth.so no_hosts_equiv


Enable xinetd service:
--------------------
vi /etc/xinetd.d/rsh
#Modify this line:
disable=no

-Test rsh connectivity between the cluster nodes:
From Node1: rsh n2 date
From Node2: rsh n1 date

Enable rlogin:
---------------
vi /etc/xinetd.d/rlogin
#add this line:
disable=no

Configure Hangcheck-timer:
------------------------------
If a hang occur on a node the module will reboot it to avoid the database corruption.

*To Load the hangcheck-timer module for 2.6 kernel:

# insmod /lib/modules/`uname -r`/kernel/drivers/char/hangcheck-timer.ko  hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1

->hangcheck_tick: Defines how often in seconds, the hangcheck-timer checks the node for hangs. The default is 60, Oracle recommends 1 second.
->hangcheck_margin: Defines how long in seconds the timer waits for a response from the kernel. The default is 180, Oracle recommends 10.
->hangcheck_reboot: 1 reboot when hang occur, 0 do not reboot when hang occur.

*To confirm that the hangcheck module is loaded, enter the following command:
# lsmod | grep hang
# output will be like below
hangcheck_timer         2428  0 

*Add the service in the startup by editing this file:

vi /etc/rc.d/rc.local
#add this line
insmod /lib/modules/`uname -r`/kernel/drivers/char/hangcheck-timer.ko  hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1

You have to put the real value in place of `uname -r` which is your kernel version.
e.g.
insmod /lib/modules/2.6.32-300.32.2/kernel/drivers/char/hangcheck-timer.ko  hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1

Prepare for using Cluster Time Synchronization Service - (CTSS)
----------------------------------------------------------
Oracle Grid Infrastructure 11gr2 provides a new service called Cluster Time Synchronization Service (CTSS) that can synchronize the time between cluster nodes automatically without any manual intervention, If you want to use (CTSS) to handle this job automatically for you, then de-configure and de-install the Network Time Protocol (NTP), during the installation when Oracle find that NTP protocol is not active it will automatically activate (CTSS) to handle the time synchronization between RAC nodes for you, no more steps are required from you during the GI installation.

Disable NTP service:
# service ntpd stop
# chkconfig ntpd off
# mv /etc/ntp.conf /etc/ntp.conf.original
# rm /var/run/ntpd.pid

Disable SELINUX:
--------------
Note: Starting with 11gR2 SELinux is supported. but I'll continue disabling it. Disabling SELINUX is easier than configuring it :-) it's a nightmare :-)

vi /etc/selinux/config

SELINUX=disabled
SELINUXTYPE=targeted

#################
Extra Configurations:
#################

Configure HugePages: [361468.1]
================
What is HugePages:
--------------------
HugePages is a feature allows larger pages to manage memory as the alternative to the small 4KB pagesize.
HugePages is crucial for faster Oracle database performance on Linux if you have a large RAM and SGA > 8G.
HugePages are not only for 32X system but for improving the memory performance on 64x kernel.

HugePages Pros:
------------------
-Doesn't allow memory to be swaped.
-Less Overhead for Memory Operations.
-Less Memory Usage.

Huge Pages Cons:
--------------------
-You must set  MEMORY_TARGET and MEMORY_MAX_TARGET = 0 as Automatic Memory Management (AMM) feature is incompatible with HugePages:
ORA-00845: MEMORY_TARGET not supported on this system

Implementation:

1-Make sure that MEMORY_TARGET and MEMORY_MAX_TARGET = 0 on All instances.
2-Make sure that all instances on the server are up.
3- Set these parameters equal or greater than SGA size: (values are in KB)

# vi /etc/security/limits.conf 
oracle   soft   memlock    20971520
oracle   hard   memlock    20971520

Here I'll set SGA to 18G so I'll set it to 20G in limits.conf file.

Re-login to oracle user and check the value:
# ulimit -l

4- Create this script:

# vi /root/hugepages_settings.sh

#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.
#
# This script is provided by Doc ID 401749.1 from My Oracle Support 
# http://support.oracle.com
# Welcome text
echo "
This script is provided by Doc ID 401749.1 from My Oracle Support 
(http://support.oracle.com) where it is intended to compute values for 
the recommended HugePages/HugeTLB configuration for the current shared 
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and 
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size, 
   as the new SGA will not fit in the previous HugePages configuration, 
   it had better disable the whole HugePages, 
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup 
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m
Press Enter to proceed..."
read
# Check for the kernel version
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`
# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`
if [ -z "$HPG_SZ" ];then
    echo "The hugepages may not be supported in the system where the script is being executed."
    exit 1
fi
# Initialize the counter
NUM_PG=0
# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"`
do
    MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
    if [ $MIN_PG -gt 0 ]; then
        NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
    fi
done
RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q`
# An SGA less than 100MB does not make sense
# Bail out if that is the case
if [ $RES_BYTES -lt 100000000 ]; then
    echo "***********"
    echo "** ERROR **"
    echo "***********"
    echo "Sorry! There are not enough total of shared memory segments allocated for 
HugePages configuration. HugePages can only be used for shared memory segments 
that you can list by command:
    # ipcs -m
of a size that can match an Oracle Database SGA. Please make sure that:
 * Oracle Database instance is up and running 
 * Oracle Database 11g Automatic Memory Management (AMM) is not configured"
    exit 1
fi
# Finish with results
case $KERN in
    '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
           echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
    '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
     *) echo "Unrecognized kernel version $KERN. Exiting." ;;
esac

5-Run script hugepages_settings.sh to help you get the right value for vm.nr_hugepages parameter:
# chmod 700 /root/hugepages_settings.sh
# sh /root/hugepages_settings.sh


6-Edit the file /etc/sysctl.conf and set the vm.nr_hugepages parameter as per the script output value:
# cat /etc/sysctl.conf|grep vm.nr_hugepages
# vi /etc/sysctl.conf
vm.nr_hugepages = 9220

7-Reboot the server.

8-Check and Validate the Configuration:
# grep HugePages /proc/meminfo

Note: Any further modification to the following should be followed by re-run hugepages_settings.sh script and put the new value of vm.nr_hugepages parameter:
      -Amount of RAM installed for the Linux OS changed.
      -New database instance(s) introduced.
      -SGA size / configuration changed for one or more database instances.


Increase vm.min_free_kbytes system parameter: [Doc ID 811306.1]
================================
In case you enabled regular HugePages on your system (the thing we did above) it's recommended to increase the system parameter vm.min_free_kbytes from 51200 to 524288 This will cause the system to

start reclaiming memory at an earlier time than it would have before, therefore it can help to decrease the LowMem pressure, hangs and node evictions.

# sysctl -a |grep min_free_kbytes
vm.min_free_kbytes = 51200

# vi /etc/sysctl.conf
vm.min_free_kbytes = 524288

# sysctl -p 

# sysctl -a |grep min_free_kbytes
vm.min_free_kbytes = 51200


Disable Transparent HugePages: [Doc ID 1557478.1]
=====================
Transparent HugePages are different than regular HugePages (the one we configured above), Transparent HugePages are set up dynamically at run time.
Transparent HugePages are known to cause unexpected node reboots and performance problems with RAC & Single Node, Oracle strongly recommend to disable it.
Note: For UEK2 kernel, starting with 2.6.39-400.116.0 Transparent HugePages has been removed from the kernel.

Check if Transparent HugePages Enabled:
--------------------------------------
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] never

Disable Transparent HugePages:
-----------------------------
Add "transparent_hugepage=never" to boot kernel:

# vi /boot/grub/grub.conf
kernel /vmlinuz-2.6.39-300.26.1.el5uek ro root=LABEL=/ transparent_hugepage=never 


Configure VNC on Node1:
===================
VNC will help us login to the linux machine with a GUI session, from this GUI session we can run Oracle installer to install Grid Infrastructure and Database software. eliminating the need to go to the server room and do the installation on the server itself.

Make sure that VNCserver package is already installed:
# rpm -qa | grep vnc-server
vnc-server-4.1.2-14.el5_6.6

Modify the VNC config file:
# vi /etc/sysconfig/vncservers
Add these lines at the bottom:
VNCSERVERS="2:root"
VNCSERVERARGS[2]="-geometry 800x600 -nolisten tcp -nohttpd -localhost"

Set a password for VNC:
# vncpasswd 
Password: 
Verify:

Run a VNC session just to generate the default config files:
# vncserver :1

Configure VNC to start an Xsession when connecting:
# vi ~/.vnc/xstartup
#UN-hash these two lines:
 unset SESSION_MANAGER
 exec /etc/X11/xinit/xinitrc

Now start a VNC session on the machine:
# vncserver :1

Now you can login from any machine (your Windows PC) using VNCviewer to access that remote server using port 5900 or 5901. make sure these port are not blocked by the firewall.
VNC Viewer can be downloaded from this link:

Download Oracle 11.2.0.3 installation media:
================================

Note [ID 753736.1] have all Patch Sets + PSU reference numbers.

11.2.0.3 (for Linux x86_64) is patch#  10404530  we need only the first 3 zip files from 1-3.
 (1&2 for database, 3 for grid, 4 for client, 5 for gateways, 6 examples cd, 7 for deinstall).

I'll extract the first 3 zip files which have Grid and Database binaries under /u02/stage

###########################
Grid Infrastructure installation:
###########################

Setup Cluverify:
===========
Cluverify is a tool checks the fulfillment of RAC and database installation prerequisites.

cd /u02/stage/grid/rpm
rpm -ivh cvuqdisk-1.0.9-1.rpm

Check the fulfillment of Grid Infrastructure setup prerequisites: (using Cluverify tool)
------------------------------------------------------------
cd /u02/stage/grid
./runcluvfy.sh stage -pre crsinst -n ora1123-node1,ora1123-node2  -verbose

Grid installation:
============
On Node1:
Start a VNC session on the server to be able to open a GUI session with the server and run Oracle Installer:
# vncserver :1

Login to the server from your PC using VNCviewer, then from the GUI session execute the following:
# xhost +
# su - oracle
# cd /u02/stage/grid
# chmod +x runInstaller
# ./runInstaller

During the installation:
================
Click "skip software updates": 
 >Install and configure Grid Infrastructure for a cluster.

 >Advanced Installation

 >Grid Plug and Play:
   Cluster Name:  cluster
    SCAN Name: cluster-scan
    SCAN Port: 1523

 >Cluster Node Information:
   Add:
   ora1123-node2
   ora1123-node2-vip

 >Network Interface Usage:
   eth0 Public
   eth3 Private
   eth1 Do Not Use
   eth2 Do Not Use

Note: Starting With Oracle (11.2.0.2), you are no longer required to use the network bounding technique to configure interconnect redundancy. You can now define at most four interfaces for redundant interconnect (private network) during the installation phase.

 >Storage Option: Shared File System

 >OCR Storage: Normal Redundancy
   /ora_ocr1/ocr1.dbf
   /ora_ocr2/ocr2.dbf
   /ora_ocr3/ocr3.dbf

Note: Oracle strongly recommend to set the voting disks number to an odd number like 3 or 5 and so on, because the cluster must be able to access more than half of the voting disks at any time.


 >Voting Storage:Normal Redundancy
   /ora_voting1/voting1.dbf
   /ora_voting2/voting2.dbf
   /ora_voting3/voting3.dbf

 >Do not use IPMI

Oracle Base:
   /u01/oracle/
RAC Installation path:  /u01/grid/11.2.0.3/grid
OraInventory path:      /u01/oraInventory

At the End of the installation run:
---------------------------------
Run orainstRoot.sh On Node1 then run it on Node2:
# /u01/oraInventory/orainstRoot.sh

Run root.sh On Node1 once it finish run it on Node2:
# /u01/grid/11.2.0.3/grid/root.sh 

Just hit ENTER when get this message:
Enter the full pathname of the local bin directory: [/usr/local/bin]: 

Note: root.sh may take from 5 to 15 minutes to complete.

Once root.sh finish, go back to the Execute Configuration Scripts window and press "OK".

I've uploaded the screenshots to this link:

 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 In case you are doing an in-place upgrade of an older release and you are installing the GI on a different home, the following should be done within a downtime window:        
 At the End of installation by root user run:
 -------------------------------------------
 Note: In case of doing an in-place upgrade Oracle recommends that you leave Oracle RAC instances running from Old GRID_HOME. 
 Execute this script:
 # /u01/grid/11.2.0.3/grid/rootupgrade.sh 
   =>Node By Node (don't run it in parallel).
   =>rootupgrade will restart cluster resources on the node which is run on. 
   =>Once you finish with rootupgrade.sh ,click OK on the OUI window to finish the installation.
 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

The outputs of executed commands:
---------------------------
#/u01/oraInventory/orainstRoot.sh

Changing permissions of /u01/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/oraInventory to oinstall.
The execution of the script is complete.

#/u01/grid/11.2.0.3/grid/root.sh 
Node1 outputs:
-------------
[root@ora1123-node1 /u01]#/u01/grid/11.2.0.3/grid/root.sh 
Performing root user operation for Oracle 11g 

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/grid/11.2.0.3/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/11.2.0.3/grid/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
OLR initialization - successful
  root wallet
  root wallet cert
  root cert export
  peer wallet
  profile reader wallet
  pa wallet
  peer wallet keys
  pa wallet keys
  peer cert request
  pa cert request
  peer cert
  pa cert
  peer root cert TP
  profile reader root cert TP
  pa root cert TP
  peer pa cert TP
  pa peer cert TP
  profile reader pa cert TP
  profile reader peer cert TP
  peer user cert
  pa user cert
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'ora1123-node1'
CRS-2676: Start of 'ora.mdnsd' on 'ora1123-node1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'ora1123-node1'
CRS-2676: Start of 'ora.gpnpd' on 'ora1123-node1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'ora1123-node1'
CRS-2672: Attempting to start 'ora.gipcd' on 'ora1123-node1'
CRS-2676: Start of 'ora.cssdmonitor' on 'ora1123-node1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'ora1123-node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'ora1123-node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'ora1123-node1'
CRS-2676: Start of 'ora.diskmon' on 'ora1123-node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'ora1123-node1' succeeded
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting disk: /ora_voting1/voting1.dbf.
Now formatting voting disk: /ora_voting2/voting2.dbf.
Now formatting voting disk: /ora_voting3/voting3.dbf.
CRS-4603: Successful addition of voting disk /ora_voting1/voting1.dbf.
CRS-4603: Successful addition of voting disk /ora_voting2/voting2.dbf.
CRS-4603: Successful addition of voting disk /ora_voting3/voting3.dbf.
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   205267b4e4334fc9bf21154f92cd30fa (/ora_voting1/voting1.dbf) []
 2. ONLINE   83217239b9c84fe9bfbd6c5e76a9dcc1 (/ora_voting2/voting2.dbf) []
 3. ONLINE   41a59373d30b4f6cbf6f41c50dc48dbd (/ora_voting3/voting3.dbf) []
Located 3 voting disk(s).
Configure Oracle Grid Infrastructure for a Cluster ... succeeded


Node2 outputs:
-------------
[root@ora1123-node2 /u01/grid/11.2.0.3/grid]#/u01/grid/11.2.0.3/grid/root.sh 
Performing root user operation for Oracle 11g 

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/grid/11.2.0.3/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/11.2.0.3/grid/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
OLR initialization - successful
Adding Clusterware entries to inittab
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node ora1123-node1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

> In the last page it will show error message regarding Oracle Clusterware verification utility failed, just ignore it. Our installation indeed is successful.

Test the installation:
================
-> Check the logs under: /u02/oraInventory/logs

By oracle:
cluvfy stage -post crsinst -n ora1123-node1,ora1123-node2 -verbose
crsctl check cluster -all
# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy

# olsnodes -n
# ocrcheck
# crsctl query crs softwareversion
# crsctl query crs activeversion
# crs_stat -t -v

Confirm clusterware time synchronization service is running (CTSS):
--------------------------------------------------------------------
# crsctl check ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 0

Create crs_stat script to show you a nice output shape of crs_stat command:
-----------------------------------------------------------------------
cd /u01/grid/11.2.0.3/grid/bin
vi crsstat

#--------------------------- Begin Shell Script ----------------------------
#!/bin/bash
##
#Sample 10g CRS resource status query script
##
#Description:
# - Returns formatted version of crs_stat -t, in tabular
# format, with the complete rsc names and filtering keywords
# - The argument, $RSC_KEY, is optional and if passed to the script, will
# limit the output to HA resources whose names match $RSC_KEY.
# Requirements:
# - $ORA_CRS_HOME should be set in your environment
RSC_KEY=$1
QSTAT=-u
AWK=/usr/bin/awk # if not available use /usr/bin/awk
# Table header:echo ""
$AWK \
'BEGIN {printf "%-75s %-10s %-18s\n", "HA Resource", "Target", "State";
printf "%-75s %-10s %-18s\n", "-----------", "------", "-----";}'
# Table body:
/u01/grid/11.2.0.3/grid/bin/crs_stat $QSTAT | $AWK \
'BEGIN { FS="="; state = 0; }
$1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1};
state == 0 {next;}
$1~/TARGET/ && state == 1 {apptarget = $2; state=2;}
$1~/STATE/ && state == 2 {appstate = $2; state=3;}
state == 3 {printf "%-75s %-10s %-18s\n", appname, apptarget, appstate; state=0;}'
#--------------------------- End Shell Script ------------------------------

chmod 700 /u01/grid/11.2.0.3/grid/bin/crsstat
scp crsstat root@node2:/u01/grid/11.2.0.3/grid/bin

Now you can use "crs" command that been included in the oracle profiles to execute crs_stat -t in a cute format.

Change OCR backup location:
=========================
# ocrconfig -showbackup
# ocrconfig -backuploc /u01/grid/11.2.0.3/grid/cdata/cluster11g


Modify RAC configurations:
#######################

=Configure CSS misscount:
 ====================
 The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before kicking out the problematic node...

Check current configurations for css misscount :
# crsctl get css misscount

It's recommended to backup OCR disks before running the following command.
configure css misscount: -From One Node only-
# crsctl set css misscount 60


#################################
Install Oracle Database Software 11.2.0.3:  
#################################

Note: It's recommended to backup oraInventory directory before starting this stage.
Note: Ensure that the clusterware services are running on both nodes.

Run cluvfy to check database installation prerequisites:
========
# cluvfy stage -pre dbinst -n ora1123-node1,ora1123-node2 -verbose
-->Ignore cluster scan errors.

Execute runInstaller:
===============
Connect to the server using VNCviewer to open a GUI session, that enable you to run Oracle installer
Note: Oracle Installer can also run from the command line mode using -silent and -responseFile attributes, you should prepare the response file that will hold all installation selections"

# xhost +
# su - oracle
# cd /u02/stage/database
# ./runInstaller

During the installation:
==================
Select "skip software updates"
Select "Install database Software only"
Select "Oracle Real Application Clusters database installation" -> Select both nodes (selected by default).
Select "Enterprise" -> Selected options like (Partitioning, Data Mining, Real Application Testing)
 =>From security perspective it's recommended to install only the options you need.
 =>From licensing perspective there is no problem if installed an options you are not using it, as Oracle charges only on the options are being used.
Select "dba" group for OSDBA, leave it blank for OSOPER (I never had a need to login to the database with SYSOPER privilege).
Ignore SCAN warning in the prerequisite check page
ORACLE_BASE: /u01/oracle
ORACLE_HOME (Software Location): /u01/oracle/11.2.0.3/db

At the end of installation: By root user execute /u01/oracle/11.2.0.3/db/root.sh on Node1 first then execute it on Node2:
# /u01/oracle/11.2.0.3/db/root.sh

Go back to the Oracle Installer:
click OK.
click Close.

I've uploaded Oracle software installation snapshots to this link:

Post Steps:
########
Installation verification:
=================
# cluvfy stage -post crsinst -n ora1123-node1,ora1123-node2 -verbose
  =>All passed except SCAN check which I'm not using it in my setup.

Do some backing up:
==============
Query Voting disks:
------------------
crsctl query css votedisk

Backing up voting disks manually is no longer required, dd command is not supported in 11gr2 for backing up voting disks. Voting disks are backed up automatically in the OCR as part of any configuration change and voting disk data is automatically restored to any added voting disks.

Backup the OCR: clusterware is up and running
------------------
# ocrconfig -export /u01/grid/11.2.0.3/grid/cdata/cluster11g/ocr_after_DB_installation.dmp
# ocrconfig -manualbackup

Backup oraInventory directory:
---------------------------------
# cp -r /u01/oraInventory /u01/oraInventory_After_DBINSTALL

Backup root.sh:
-----------------
# cp /u01/grid/11.2.0.3/grid/root.sh /u01/grid/11.2.0.3/grid/root.sh._after_installation
# cp /u01/oracle/11.2.0.3/db/root.sh /u01/oracle/11.2.0.3/db/root.sh_after_installation

Backup ORACLE_HOME: 
---------------------------
# tar cvpf /u01/oracle/11.2.0.3/db_After_DB_install.tar /u01/oracle/11.2.0.3/db

Backup GRID_HOME: 
----------------------
# tar cvpf /u01/grid/11.2.0.3/grid_after_DB_install.tar /u01/grid/11.2.0.3/grid

Note: Although clusterware services are up and running, GI Home can be backed up online.

Backup the following files:
--------------------------
# cp /usr/local/bin/oraenv  /usr/local/bin/oraenv.11.2.0.3
# cp /usr/local/bin/dbhome  /usr/local/bin/dbhome.11.2.0.3
# cp /usr/local/bin/coraenv /usr/local/bin/coraenv.11.2.0.3

-Restart RAC severs more than once and ensure that RAC processes are starting up automatically.

July SPU Patch Apply:
##################
-Since October 2012 Oracle re-named CPU Critical Patch Update to SPU Security Patch Update, both are same, it's just a renaming .
-SPU patches are cumulative once you apply the latest patch, there is no need to apply the older patches.
-To eliminate making a big change on my environment, plus minimizing the downtime when applying security patches, I prefer to apply SPU (CPU before) over applying PSU patches (which contains SPU patch + Common Bug fixes that affect large number of customers).
OPatch utility version must be 11.2.0.3.0 or later: (OPatch utility is the tool being used to apply SPU patches)
  >> To download the latest OPatch utlility: Go to Metalink, search for Patch# 6880880
   >Backup the original OPatch directory under ORACLE_HOME and just unzip the patch file under ORACLE_HOME.

> $PATH must refer to /usr/ccs/bin
  # export PATH=$PATH:/usr/ccs/bin

> Unzip the Patch:
  # cd $ORACLE_HOME
  # unzip p16742095_112030_Linux-x86-64.zip

Patch Installation:
=============
Remember we still don't have any running database for the time being.
Shutdown Nodeapps or crs:
# srvctl stop nodeapps -n ora1123-node1

Patch Installation:
# cd $ORACLE_HOME/16742095
# opatch napply -skip_subset -skip_duplicate -local

Go to Node2 and do the same steps for installing the latest SPU patch...


NEXT:

Part III  Create a standby database under the new 11.2.0.3 environment being refreshed from the 11.2.0.1 primary DB.


No comments:

Post a Comment