Abstract: NODE DOWN IN CAA CLUSTER DUE TO CONFIGRM MEMORY LEAK

Fix available in fixpack: Fix not currently available in a Service Pack, see below for information on an Interim Fix.

APAR status
Closed as fixed if next.
Error description
***************************************************************
* USERS AFFECTED:
* Systems running rsct.core.rmc 3.1.5.0 through 3.1.5.8,
* as well as rsct.core.rmc 3.2.0.0 through 3.2.0.4.
* This includes AIX 6.1 TL9 and AIX 7.1 TL3, and VIOS 2.2.3.
* Other AIX levels can be affected if RSCT has been updated
* independently of AIX.
***************************************************************
* PROBLEM DESCRIPTION:
* Starting in rsct.core.rmc 3.1.5.0, a slow memory leak in
* IBM.ConfigRM under CAA can lead to a cluster service
* shutdown, which causes to a node failure in both PowerHA v7
* (halt) and VIOS SSP (system panic).
* The leak occurs as long as CAA is active, regardless of
* what PowerHA or SSP is doing, and only on the node
* operating as the ConfigRM Group leader.  The GL node
* can be identified in "lssrc -ls IBM.ConfigRM".
* A reboot is guaranteed to reset the situation.  Time to
* failure after a new boot is estimated to be between 4 and 8
* months, although no existing records of failures in the
* field still retained the time of the last reboot, so a
* precise deadline is not known.
***************************************************************
* RECOMMENDATION:
* The fix for RSCT 3.1.5 is available via RSCT APAR IV66606.
* An interim fix for RSCT 3.1.5 is available from either:
* ftp://aix.software.ibm.com/aix/ifixes/iv66606/
* https://aix.software.ibm.com/aix/ifixes/iv66606/
*
* The fix for RSCT 3.2.0 is available via RSCT APAR IV69760.
* The fix for RSCT 3.2.0 will also ship with:
* AIX 6.1 TL9 SP5, AIX 7.1 TL3 SP5, and VIOS 2.2.3.5.
* An interim fix for RSCT 3.2.0 is available from either:
* ftp://aix.software.ibm.com/aix/ifixes/iv69760/
* https://aix.software.ibm.com/aix/ifixes/iv69760/
***************************************************************
Local fix

Problem summary
***************************************************************
* USERS AFFECTED:
* Systems running rsct.core.rmc 3.1.5.0 through 3.1.5.8,
* as well as rsct.core.rmc 3.2.0.0 through 3.2.0.4.
* This includes AIX 6.1 TL9 and AIX 7.1 TL3, and VIOS 2.2.3.
* Other AIX levels can be affected if RSCT has been updated
* independently of AIX.
***************************************************************
* PROBLEM DESCRIPTION:
* Starting in rsct.core.rmc 3.1.5.0, a slow memory leak in
* IBM.ConfigRM under CAA can lead to a cluster service
* shutdown, which causes to a node failure in both PowerHA v7
* (halt) and VIOS SSP (system panic).
* The leak occurs as long as CAA is active, regardless of
* what PowerHA or SSP is doing, and only on the node
* operating as the ConfigRM Group leader.  The GL node
* can be identified in "lssrc -ls IBM.ConfigRM".
* A reboot is guaranteed to reset the situation.  Time to
* failure after a new boot is estimated to be between 4 and 8
* months, although no existing records of failures in the
* field still retained the time of the last reboot, so a
* precise deadline is not known.
***************************************************************
* RECOMMENDATION:
* The fix for RSCT 3.1.5 is available via RSCT APAR IV66606.
* An interim fix for RSCT 3.1.5 is available from either:
* ftp://aix.software.ibm.com/aix/ifixes/iv66606/
* https://aix.software.ibm.com/aix/ifixes/iv66606/
*
* The fix for RSCT 3.2.0 is available via RSCT APAR IV69760.
* The fix for RSCT 3.2.0 will also ship with:
* AIX 6.1 TL9 SP5, AIX 7.1 TL3 SP5, and VIOS 2.2.3.5.
* An interim fix for RSCT 3.2.0 is available from either:
* ftp://aix.software.ibm.com/aix/ifixes/iv69760/
* https://aix.software.ibm.com/aix/ifixes/iv69760/
***************************************************************
Problem conclusion

Temporary fix
*********
* HIPER *
*********
Comments
This APAR is being closed FIN. This means that a solution to
this APAR is expected to be delivered from IBM in a release
(if any) to be available within the next 24 months.
APAR information
APAR number IV71219
Reported component name AIX V7.1
Reported component ID 5765H4000
Reported release 710
Status CLOSED FIN
PE NoPE
HIPER YesHIPER
Submitted date 2015-03-19
Closed date 2015-03-19
Last modified date 2015-03-19
APAR is sysrouted FROM one or more of the following:
 IV71217
 
APAR is sysrouted TO one or more of the following:

Applicable component levels
R710 PSY    UP

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here